The Korean Society of Marine Engineering
[ Original Paper ]
Journal of Advanced Marine Engineering and Technology - Vol. 49, No. 6, pp.480-488
ISSN: 2234-7925 (Print) 2765-4796 (Online)
Print publication date 31 Dec 2025
Received 14 Nov 2025 Revised 05 Dec 2025 Accepted 18 Dec 2025
DOI: https://doi.org/10.5916/jamet.2025.49.6.480

An approach for anomaly detection of motor rotating shaft alignment using deep learning

Jae-Hun Kim1 ; Chul-Sun Park
1Ph. D. Candidate, Department of Industrial Systems and Engineering, Changwon National University, Tel: +82-55-213-3720 kjhun0811@gmail.com

Correspondence to: Professor, Department of Industrial Systems and Engineering, Changwon National University, 20 Changwondaehak-ro Uichang-gu Changwon-si, Gyeongsangnam-do 51140, KOREA, E-mail: cspark@changwon.ac.kr, Tel: +82-55-213-3728

Copyright © The Korean Society of Marine Engineering
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

With increasing interest in eco-friendly ships, extensive research is being conducted on electric/hybrid vessels, highlighting the growing need for real-time monitoring of motors. Motor malfunctions can occur due to structural defects or component damage, but particularly due to shaft misalignment, which leads to excessive vibration, noise, and accelerated wear on bearings and couplings. These malfunctions can pose risks of personal injury or cause significant disruptions to industrial operations, making the development of a reliable condition monitoring system essential to prevent such issues. To address these issues, this research proposes a Convolutional Neural Network (CNN)-based anomaly detection technique that uses a microphone sensor to monitor the rotating shaft alignment. The proposed CNN classification model converts sound time-series data into spectrograms using the Short-Time Fourier Transform (STFT), capturing time-frequency patterns that specifically reflect motor anomalies related to shaft misalignment. The performance of the proposed CNN model is compared with machine learning models, demonstrating superior accuracy and reliability in detecting these anomalies. By monitoring sensor data, the proposed model serves as a diagnostic system to identify and respond to motor abnormalities, particularly rotating shaft misalignment, enhancing safety and reliability.

Keywords:

Microphone sensor, Anomaly detection, STFT (Short-Time Fourier Transform), Spectrogram, Convolutional neural network

1. Introduction

Electric motors play a crucial role in the operation of modern ships, being central components responsible for driving propulsion systems, auxiliary machinery, and essential ship equipment. Motors perform essential functions such as propulsion, steering, and pump operation, and significantly contribute to the efficiency, safety, and environmental performance of ships[1]. The shipping industry is rapidly shifting to electrification to reduce emissions and improve energy efficiency, further highlighting the need for motors. Furthermore, with the development and commercialization of electric/hybrid propulsion vessels, the need for predictive maintenance systems to effectively diagnose motors is increasing[2]. The reliability and performance of these motors directly impact a ship's operational capability, fuel efficiency, and compliance with global sustainability standards. In particular, the accuracy of shaft alignment within the propulsion system is a key factor in supporting these objectives[3].

In ship propulsion systems, shaft misalignment is a major cause of failure in rotating machinery, potentially causing excessive vibration and noise throughout the vessel. In particular, shaft misalignment due to structural deformation can amplify vibration by causing relative displacement of bearing supports. It is known that approximately 70% of vibration problems in rotating machinery are caused by shaft misalignment[4]. Furthermore, shaft misalignment accelerates premature wear of bearings, seals, and couplings, leading to secondary damage such as bearing failure, rotor bending, and coupling damage, increasing vessel downtime and safety risks[5]. Furthermore, shaft misalignment reduces power transmission efficiency, increasing fuel consumption and significantly impacting operating costs on long-haul vessels[4]. Consequently, shaft misalignment poses a significant threat to the safety of ships, cargo, the environment, and crew, and is considered a key factor in determining the reliability of motor systems[5][6].

To address these issues, shaft alignment in a ship propulsion system refers to the process of accurately aligning components such as the propulsion shaft, bearings, and couplings in a straight line, and is designed to efficiently transmit power while taking into account ship hull deformation (SHD). Due to dynamic factors in the marine environment, such as wave loads, load changes, and speed fluctuations, continuous shaft monitoring is required, which is essential for minimizing ship vibration and noise and maintaining the stability of the entire system[4]. Research has been conducted to analyze the effects of shaft alignment using finite element analysis (FEA), real-time monitoring methods, and vibration signal analysis[5][6]. Against this backdrop, advances in condition-based monitoring and maintenance prediction technologies are contributing to optimizing maintenance schedules in ship operations, reducing operating costs, and improving safety[4].

It is crucial to understand the need for motor monitoring and the importance of robust maintenance in the shipping industry, given the critical role of shaft alignment and the harsh operating environment. Therefore, detecting abnormalities in motors used in various fields is crucial, and methodologies such as artificial intelligence are being utilized to efficiently evaluate and diagnose systems.

Park et al., Egaji et al., and Hiruta et al. proposed anomaly detection systems using machine learning for motor condition monitoring, but their performances have limitations, as they may vary depending on hyperparameters[7]-[9]. Yun et al. demonstrated that applying STFT spectrograms can be an effective tool for anomaly detection and condition monitoring, but the performance of the prediction model may deteriorate depending on the sensor location[10]. Choi et al. improved detection accuracy through a deep learning model for fault diagnosis of a rotating body, but their application to actual processes is limited[11].

Typically, statistical models such as ARIMA (Auto-regressive Integrated Moving Average) and AI-based models such as RNN (Recurrence Neural Network) and LSTM (Long Short-Term Memory) are used to analyze and forecast time series data. However, while these models primarily analyze the temporal characteristics of time series data, they may have limitations in extracting or reflecting periodicity or frequency-domain characteristics. In particular, time series data may contain data unsuitable for analysis, such as noise[12], making effective processing of this data crucial. Therefore, to facilitate analysis of time series data, data is often converted into two-dimensional images and used as input for machine learning or deep learning models. This method visually represents data patterns, helping models learn effectively.

In this study, we propose a motor shaft alignment anomaly detection model using sound time-series data collected via microphone sensor. First, to effectively reflect time-frequency characteristics, the time-series data is transformed into a spectrogram using the Short-Time Fourier Transform (STFT), and the time-frequency characteristics are extracted in the form of a 2D matrix. The transformed data is used as input data for a convolutional neural network (CNN). The first 80% of the data is used for training, and the remaining 20% is used for testing. Furthermore, the performance of the proposed motor shaft alignment anomaly detection model is compared with that of a machine learning models, and the generalization performance of the proposed anomaly detection model is verified using the KAIST dataset[13].

This paper is structured as follows. Chapter 2 discusses data collection and the validation dataset from motor testing. Chapter 3 describes the transformation of sound time-series data into the time-frequency domain using the STFT and the data processing method used for this transformation. Chapter 4 discusses the development of a classification model using a CNN and the results of a performance comparison. Finally, Chapter 5 concludes with a discussion of the limitations of the proposed methodology and suggestions for future research.


2. Data Acquisition

In this study, for data collection, the experimental setup illustrated in Figure 1 was used while varying the torque of the motor mounting bolts according to the values listed in Table 1 (20, 40, 60, 80 kgf·cm), which were selected based on the recommended fastening torque(60 kgf·cm) to model motor fixation failures and excessive tightening-induced misalignment issues commonly encountered in industrial settings. The torque variations were precisely controlled using a torque wrench, with motor operation time kept consistent across conditions to enhance variable control. When the tightening torque is low(20, 40 kgf·cm, loose abnormality), the motor is not properly fixed, which causes positional changes due to vibration or external force during operation. This simulates a failure scenario where soft foot occurs, leading to shaft alignment instability and increased motor vibration.

Figure 1:

Sound Measuring Equipment

Motor experimental dataset

Conversely, when the tightening torque is high(80 kgf·cm, alignment abnormality), excessive stress is applied to the motor frame or bolt fastening part, causing deformation, which causes distortion of the shaft centerline and misalignment of the motor shaft. Such changes in the degree of bolt tightening directly affect the stability and performance of the motor, and both loose and excessive tightening can cause misalignment, which can accelerate problems such as bearing wear and coupling damage. The dataset was categorized based on torque values into loose abnormality(20, 40 kgf·cm), normal(60 kgf·cm), and alignment abnormality(80 kgf·cm). The technical specifications of the motor and microphone sensor used in this study are as shown in Table 2 and Table 3, and Figure 2 shows the sound time series graph at tightening torques of 20, 40, 60, 80 kgf·cm.

Equipment Specifications

Microphone Sensor Technical data

Figure 2:

Sound Time Series Data (20, 40, 60, 80 kgf·cm)

Additionally, the generalization performance was validated using a dataset released by the Korea Advanced Institute of Science and Technology(KAIST) in 2023. The dataset consists of simulated fault conditions in rotating machinery. Vibration, sound, temperature, and current data were collected under various load conditions(0 N·m, 2 N·m, 4 N·m), with the motor operating at a rated speed of 3,010 RPM. In this study, vibration data under a 4 N·m load condition was adopted, and divided into 15 detailed categories: normal(NOR), bearing inner ring defect(BPFI), bearing outer ring defect(BPFO), unbalance defect(UNB), and misalignment defect(MISALI). Bearing inner ring defect and bearing outer ring defect are classified into three levels(0.3 mm, 1.0 mm, 3.0 mm) according to the bearing crack size, and misalignment is classified into three levels(0.1 mm, 0.3 mm, 0.5 mm) according to the shaft displacement. The imbalance defects consist of five levels(583 mg, 1,169 mg, 1,751 mg, 2,239 mg, 3,318 mg) based on the masses added to the rotor disk. Table 4 summarizes the dataset labels and sample sizes for each class.

KAIST experimental dataset


3. Signal Processing using STFT

In the analysis of time-series data such as vibration and sound, signals are transformed into the frequency domain to identify patterns and common approaches include spectral analysis via the Fourier Transform[14] and time-frequency analysis to track changes in frequency components over time[15]. Converting signals measured from the sensor from the time domain to the frequency domain can provide information about frequency, but information about time is lost. On the other hand, time-frequency analysis can detect frequencies present in the measured signal and detect changes that occur over time[16], and STFT is known to be an effective method[17]. This effectiveness stems from the ability of STFT spectrograms to simultaneously capture both frequency components and their temporal variations, overcoming the limitations of pure frequency-domain analysis. As shown in Figure 3, applying Fast Fourier Transform(FFT) to the time-series signals revealed that harmonic components such as 1X, 2X, and 3X alone were insufficient for effectively distinguishing between normal and abnormal classes (differences due to bolt tightening torque conditions). Therefore, to detect anomalies according to bolt tightening torque, the time-series data were divided into appropriate windows, STFT was performed, and the resulting spectrograms were used as input feature.

Figure 3:

FFT Analysis of Sound Data (20, 40, 60, 80 kgf·cm)

The STFT extracts time-frequency characteristics by dividing a time-series signal into regular time intervals and performing a Fourier transform on each interval (window). Each window represents a portion of the data being analyzed, and the window size determines the temporal range of data processed at once. Window size directly affects the time and frequency resolution of the STFT, and a trade-off exists between time and frequency. Short windows are sensitive to temporal changes and are advantageous for capturing rapid signal changes. However, small data volumes can lower frequency resolution, making it difficult to distinguish detailed frequency components. This trade-off is a key factor in determining the performance of the STFT, and the window size selection must be carefully considered based on the signal characteristics and analysis objectives. Figure 4 compares STFT spectrogram images generated using different window sizes for sound data under normal (60 kgf·cm) tightening torque conditions, visually demonstrating the difference in time-frequency resolution depending on the window size.

Figure 4:

STFT spectrogram with various window sizes

In this study, we applied the overlap technique to mitigate the time-frequency tradeoff in STFT, thereby improving time resolution while preserving frequency resolution. Several window sizes were evaluated to effectively capture the primary frequency components of the sound data rather than instantaneous changes or impacts. Table 5 presents the frequency and time resolutions for different window sizes. To objectively select the optimal window size, the signal-to-noise ratio(SNR) was further evaluated in the diagnostic frequency band. As shown in Table 6, the window size of 1650 achieved the highest SNR across all conditions, demonstrating superior separation of fault-related acoustic features from background noise compared to other sizes. Therefore, the window size was set to 1650 with an overlap ratio of 50% (825 samples), resulting in a frequency resolution of approximately 1 Hz and a time resolution of 0.500 s.

Comparison of frequency and time resolution for different STFT window sizes

Comparison of Signal-to-Noise Ratio(SNR) according to window sizes

The spectrogram, obtained by applying STFT to the collected data in the time-frequency domain, is shown in Figure 5. It can be seen that the STFT spectrogram image generated from the data shows different patterns depending on the torque value of the fastening bolt.

Figure 5:

STFT spectrogram((a) 20kgf·cm, (b) 40kgf·cm, (c) 60kgf·cm, (d) 80kgf·cm))


4. Development of a CNN-based Anomaly Detection Model

To develop a CNN-based anomaly detection model, first, the collected sound time-series data is converted into STFT spectrogram images, which are then expressed in grayscale to account for computational costs and memory usage. Second, the generated images are labeled and then normalized so that the image feature values fall between 0 and 1. Third, during the model training phase, 80% of the image data is used as training data to create a classification model, and the remaining 20% (test data) is used to validate the trained model. The activation functions used in each layer of the CNN are as follows. In the hidden layer, 'ReLu' is used because it can learn nonlinear problems, is simple to compute, and is computationally efficient, while also addressing the vanishing gradient problem. In the final output layer, 'softmax' is used to represent the output (predicted value) of a multi-class problem as a probability, allowing for intuitive interpretation. In addition, the optimization algorithm (optimizer) used in the compilation phase of the model was 'adam', which combines Momentum and RMSProp. 'adam' optimizer is efficient in learning, requires little memory, and is suitable for problems with large amounts of data and multidimensional parameters[18]. The loss function converges faster than the mean square error (MSE) loss[19] and was set to categorical crossentropy used with 'softmax'. In addition, the ModelCheckpoint callback function was used to check the accuracy at each epoch and save the model with the highest accuracy. The detailed architecture of the proposed CNN model is summarized in Table 7. The model consists of three convolutional layers, each followed by a max-pooling layer. All convolutional layers use 3×3 kernels with stride=1 and 'same' padding to preserve spatial dimensions before pooling. Max-pooling layers employ 2×2 pools with stride=2. No dropout layers were used, and the model was trained for 100 epochs with a batch size of 32.

Detailed architecture of the proposed CNN model

Figure 6 and Figure 7 show the learning curves of a model using STFT images. Both the training loss and validation loss are decreasing, indicating that training proceeded normally without overfitting. The training data is used to update the model's weights through backpropagation, while the validation data is not used for weight updates but rather to evaluate the model's generalization performance. Therefore, the training loss is the difference between the actual value calculated using the training data and the validation loss, while the validation loss is the difference calculated after evaluating model performance using the validation data. Therefore, the validation loss can be said to be higher than the training loss.

Figure 6:

Training and Validation loss of the CNN

Figure 7:

Training and validation accuracy of the CNN

To further validate the window size selection made based on SNR, an ablation study was conducted by training the same CNN architecture with spectrograms generated from different window sizes (50% overlap fixed). As shown in Table 8, classification accuracy increases consistently with larger window sizes, achieving the highest value of 0.9883 at 1650. This result confirms that the initial choice of window size 1650, primarily based on SNR, also yields the best classification performance.

Classification accuracy according to window size

In this study, the proposed CNN-based anomaly detection model and compared with machine learning classifiers on the same dataset. For this purpose, we used the Random Forest (RF), which is widely used for classification and numerical prediction, has a low risk of overfitting, and is easy to interpret, and the eXtreme Gradient Boosting (XGB) classification model, which has been proven effective on various datasets for classification problems. Here, it is known that deriving descriptive statistics such as mean, variance, standard deviation, skewness, kurtosis, and percentiles for time series data and applying them to the classification model is superior to deriving indices such as accuracy from time series data, so this method was applied in this study[20]. After performing the previously described labeling on the derived descriptive statistics, the feature values were standardized to have a mean of 0 and a variance of 1. In the model training phase, the model was built using 80% of the data as training data, and the remaining 20% (test data) was used to validate the model. As shown in Table 9, the performance indicators reveal that machine learning-based models (RF, XGB), which analyze only time-domain features, have limitations in distinguishing motor conditions. This suggests that CNN-based model improve anomaly detection performance by learning complex patterns in spectrogram images, demonstrating that image-based analysis is more suitable for detecting anomalies in motor testing.

Evaluation Metrics of Motor Sound dataset

Additionally, the generalization capability of the proposed anomaly detection model was evaluated using the KAIST dataset. Figure 8 shows the confusion matrix representing the classification performance for the dataset with the characteristics of normal, bearing defect, alignment defect, and imbalance defect. In particular, it shows excellent classification performance for normal, bearing defect, and alignment defect. In addition, Table 10 shows the performance results of applying the performance and descriptive statistics of the anomaly detection model proposed in this study to the classification model, which demonstrates that preprocessing through applying STFT and spectrogram transformation to time series data can be an effective method for detecting anomalies in the alignment of rotating shafts.

Figure 8:

Confusion matrix of KAIST dataset

Evaluation Metrics of KAIST dataset


5. Conclusion

This study proposes a deep learning-based motor shaft alignment anomaly detection model using time-series data. To reflect the time-frequency characteristics of time-series data collected through a microphone sensor, spectrograms transformed through STFT were used as input data. The deep learning-based anomaly detection model was compared and analyzed with existing machine learning-based classification models (RF, XGB). The comparison results showed that the machine learning model had low reliability because it only utilized time-domain data, but the CNN model showed superior performance by analyzing time-frequency characteristics through STFT spectrograms. This indicates that an approach that reflects time-frequency characteristics is more suitable for anomaly detection. Furthermore, we used KAIST vibration dataset to verify the generalization performance of our approach.

In real-world industrial settings, analytical data collected from a variety of sensors, including vibration, acoustics, and current, exist. The methodology of this study can be useful in developing AI-based fault diagnosis systems by meticulously capturing the moment-to-moment dynamics of data—that is, the pattern changes that occur in real time over time. Future research is needed to integrate diverse forms of sensor data collected in industrial settings and to further improve the performance of anomaly detection models based on this data.

Acknowledgments

This research was supported by the 5th Educational Training Program for the Shipping, Port and Logistics from the Ministry of Oceans and Fisheries.

Author Contributions

Conceptualization, J. H. Kim and C. S. Park; Methodology, J. H. Kim; Software, J. H. Kim; Validation, J. H. Kim and C. S. Park; Formal Analysis, J. H. Kim; Investigation, J. H. Kim; Resources, J. H. Kim; Data Curation, J. H. Kim; Writing — Original Draft Preparation, J. H. Kim; Writing — Review & Editing, C. S. Park; Visualization, J. H. Kim; Supervision, C. S. Park; Project Administration, C. S. Park; Funding Acquisition, C. S. Park.

References

  • H. P. Nguyen, A. T. Hoang, S. Nizetic, X. P. Nguyen, A. T. Le, C. N. Luong, and V. V. Pham, “The electric propulsion system as a green solution for management strategy of CO2 emission in ocean shipping: A comprehensive review,” International Transactions on Electrical Energy Systems, vol. 31, no. 11, e12580, 2020.
  • J. G. Jang, Y. N. Lee, J. C. Lee, D. H. Kang, D. G. Kim, and S. S Lee, “Development of equipment for generating and measuring vibration data for electric motor,” Journal of the Society of Naval Architects of Korea, vol. 61, no. 6, pp. 498-507, 2024 (in Korean). [https://doi.org/10.3744/SNAK.2024.61.6.498]
  • Y. G. Kim, “Characteristics of the propulsion shafting alignment for a small class electric propulsion ship equipped with a reduction gear,” Journal of Advanced Marine Engineering and Technology, vol. 48, no. 6, pp. 430-437, 2024. [https://doi.org/10.5916/jamet.2024.48.6.430]
  • J. -W. Cheng, W. -J. Bu, L. Shi, and J. -Q. Fu, “A real-time shaft alignment monitoring method adapting to ship hull deformation for marine propulsion system,” Mechanical Systems and Signal Processing, vol. 197, 10366, 2023. [https://doi.org/10.1016/j.ymssp.2023.110366]
  • L. Lu, G. Li, P. Xing, W. He, Z. Feng, and H. Zhang, “Investigation on alignment state evolution of ship propulsion shafting based on chaotic characteristics of vibration signals,” Ocean Engineering, vol. 322, 120521, 2025. [https://doi.org/10.1016/j.oceaneng.2025.120521]
  • L. Murawski, “The influence of shaft line alignment accuracy on the operational reliability of marine propulsion systems,” Journal of KONES, vol. 23, no. 1, pp. 247-254, 2016. [https://doi.org/10.5604/12314005.1213580]
  • S. Y. Park and T. H. Im, “A study on machine learning-based anomaly detection algorithm using current data of fish-farm pump motor,” Journal of Internet Computing and Services, vol. 24, no. 2, pp. 37-45, 2023.
  • O. A. Egaji, T. Ekwevugbe, and M. Griffiths, “A data mining based approach for electric motor anomaly detection applied on vibration data,” Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pp. 330-334, 2020. [https://doi.org/10.1109/WorldS450073.2020.9210318]
  • T. Hiruta, K. Maki, T. Kato, and Y. Umeda, “Unsupervised learning based diagnosis model for anomaly detection of motor bearing with current data,” Procedia CIRP, vol. 98, pp. 336-341, 2021. [https://doi.org/10.1016/j.procir.2021.01.113]
  • H. T. Yun, H. J. Kim, Y. H. Jeong, and M. B. Jun, “Autoencoder-based anomaly detection of industrial robot arm using stethoscope based internal sound sensor,” Journal of Intelligent Manufacturing, vol. 34, no. 3, pp. 1427-1444, 2023. [https://doi.org/10.1007/s10845-021-01862-4]
  • G. Y. Choi, I. S. Chang, Y. H. Lee, H. S. Kang, and G. M. Park, “Fault detection of motor gear box using two stage sound classification network,” Transaction of the Korean Society of Automotive Engineers, vol. 30, no. 2, pp. 161-169, 2022. [https://doi.org/10.7467/KSAE.2022.30.2.161]
  • Y. S. Kim and K. S. Park, “The multivariate sensor data classification using time series imaging,” Journal of KIISE, vol. 49, no. 8, pp. 593-600, 2022 (in Korean). [https://doi.org/10.5626/JOK.2022.49.8.593]
  • W. H. Jung, S. H. Kim, S. H. Yun, J. W. Bae, and Y. H. Park, “Vibration, acoustic, temperature, and motor current dataset of rotating machine under varying operating conditions for fault diagnosis,” Data in brief, vol. 48, 109049, 2023. [https://doi.org/10.1016/j.dib.2023.109049]
  • D. Goyal and B. S. Pabla, “The vibration monitoring methods and signal processing techniques for structural health monitoring: a review,” Archives of Computational Methods in Engineering, vol. 23, pp. 585-594, 2016. [https://doi.org/10.1007/s11831-015-9145-0]
  • A. Majid and S. B. Mehrdad, “Leak detection in water-filled plastic pipes through the application of tuned wavelet transforms to acoustic emission signals,” Applied Acoustics, vol. 71, no. 7, pp. 634-639, 2010. [https://doi.org/10.1016/j.apacoust.2010.02.006]
  • M. Ashouri, F. F. D. Silva, and C. L. Bak, “Application of short-time Fourier transform for harmonic-based protection of meshed VSC-MTDC grids,” The Journal of Engineering, The 14th IET International Conference on AC and DC Power, vol. 16, pp. 1439-1443, 2019. [https://doi.org/10.1049/joe.2018.8765]
  • W. Zhao, Z. Wang, J. Ma, and L. Li, “Fault diagnosis of a hydraulic pump based on the CEEMD-STFT time-frequency entropy method and multiclass SVM classifier,” Shock and Vibration, vol. 2016, 2609856, 2016. [https://doi.org/10.1155/2016/2609856]
  • D. Kinga and J. B. Adam, “A method for stochastic optimization,” arXiv preprint, pp. arXiv:1412.6980, , 2014.
  • K. S. Shin and S. Y. Shin, “Implementation of photovoltaic panel failure detection system using semantic segmentation,” Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 12, pp. 1777-1783, 2021.
  • J. H. Kim, S. H. Eom, and C. S. Park, “Improving the performance of machine learning models for anomaly detection based on vibration analog signals,” Journal of Korean Society of Industrial and Systems Engineering, vol. 47, no. 2, pp. 1-9, 2024. [https://doi.org/10.11627/jksie.2024.47.2.001]

Figure 1:

Figure 1:
Sound Measuring Equipment

Figure 2:

Figure 2:
Sound Time Series Data (20, 40, 60, 80 kgf·cm)

Figure 3:

Figure 3:
FFT Analysis of Sound Data (20, 40, 60, 80 kgf·cm)

Figure 4:

Figure 4:
STFT spectrogram with various window sizes

Figure 5:

Figure 5:
STFT spectrogram((a) 20kgf·cm, (b) 40kgf·cm, (c) 60kgf·cm, (d) 80kgf·cm))

Figure 6:

Figure 6:
Training and Validation loss of the CNN

Figure 7:

Figure 7:
Training and validation accuracy of the CNN

Figure 8:

Figure 8:
Confusion matrix of KAIST dataset

Table 1:

Motor experimental dataset

Label Category Fastening Torque
0 Normal 60kgf· cm
1 Loosen1 20kgf· cm
2 Loosen2 40kgf· cm
3 Alignment 80kgf·cm
Sampling Rate 1,650Hz, Motor speed 1,150RPM

Table 2:

Equipment Specifications

Specification INDUCTION Motor
Output Power 3.7 kW (5HP)
Number of Poles 2 Poles
Voltage 220/380 V
Frequency 60 Hz
Speed[RPM] ~ 3,500
Current 13.3 A / 7.7 A

Table 3:

Microphone Sensor Technical data

Specification RG-50
Transducer type Electrete pressure transducer
Class 1 WS3F according to IEC 61094-5
Frequency range 5 Hz – 30 kHz
Sensitivity @ 1 kHz 50 mV/Pa ± 0.5 dB
Temperature range - 20 °C to + 80 ° C
Max. SPL for 1% THD at 1 kHz 130 dB Peak

Table 4:

KAIST experimental dataset

Label States Faults Sample size
0 NOR - 120
1 BPFI 0.3 mm 120
2 1.0 mm 120
3 3.0 mm 120
4 BPFO 0.3 mm 120
5 1.0 mm 120
6 3.0 mm 120
7 MISALI 0.1 mm 120
8 0.3 mm 120
9 0.5 mm 120
10 UNB 583 mg 120
11 1169 mg 120
12 1751 mg 120
13 2239 mg 120
14 3318 mg 120

Table 5:

Comparison of frequency and time resolution for different STFT window sizes

Window size Frequency Resolution[Hz] Time Resolution[s]
128 12.8906 0.0388
512 3.2227 0.1553
1024 1.6113 0.3106
1650 1.0000 0.5000

Table 6:

Comparison of Signal-to-Noise Ratio(SNR) according to window sizes

Window size Normal[dB] Loosen1[dB] Loosen2[dB] Alignment[dB]
128 26.29 31.73 24.80 25.31
512 29.15 31.64 29.04 26.28
1024 32.05 31.78 28.12 26.67
1650 32.75 32.66 29.29 26.79

Table 7:

Detailed architecture of the proposed CNN model

Layer (type) Out Shape Param #
input_layer (None, 128, 128, 1) 0
conv2d (None, 128, 128, 32) 320
max_pooling2d (None, 64, 64, 32) 0
conv2d (None, 64, 64, 64) 18,496
max_pooling2d (None, 32, 32, 64) 0
conv2d (None, 32, 32, 128) 73,856
max_pooling2d (None, 16, 16, 128) 0
flatten (None, 32768) 0
Dense (None, 256) 8,388,864
dense (None, 4) 2,028

Table 8:

Classification accuracy according to window size

Window size Accuracy
128 0.6519
512 0.9143
1024 0.9650
1650 0.9883

Table 9:

Evaluation Metrics of Motor Sound dataset

Classifier Accuracy F1-Score
RF 0.4689 0.4562
XGB 0.4494 0.4416
CNN 0.9883 0.9912

Table 10:

Evaluation Metrics of KAIST dataset

Classifier Accuracy F1-Score
RF 0.7731 0.7697
XGB 0.7535 0.7517
CNN 0.9636 0.9635