Qur’an Recitation Correction System Using Deepspeech


  • Hajon Mahdy Mahmudin Faculty of Computer Science, Universitas Esa Unggul, West Jakarta, DKI Jakarta
  • Habibullah Akbar Faculty of Computer Science, Universitas Esa Unggul, West Jakarta, DKI Jakarta




The purpose of this study was to compare the performance of the two types of models used in the task of classifying Quran verses based on audio similarity. The first model is Model B which uses MFCC features and the MaLSTM architecture, while the second model is Model C, which is Model B with additional delta features. The stages in this study consist of determining the dataset, determining the parameters, preprocessing, training, and testing. The dataset in this study was obtained from the local dataset https://sahabatibadah.com/fasih/. This study conducted data analysis based on 172,895 samples of Al-Quran recitation sounds from Juzz 30, which includes a total of 37 surahs with 564 verses. This sound data were taken from the recording on the Qara'a application and collected from 500 users of the application. In this study, 3 out of 500 users were used as training data to train speech recognition models, while one user was used as testing data. The training model used was DeepSpeech supported by TensorFlow. In the model training process, 30% of the samples were used as a validation set. Based on the results, Model B with the MFCC feature is the best model in the task of recognizing and classifying audio-based Quran verses. The use of the delta feature in Model B and Model C show a negative impact on model performance. The MFCC feature is more recommended in the recognition and classification of audio-based Qur’an verses, especially in the LSTM model architecture.


Alam, S., Sushmit, A., Abdullah, Z., Nakkhatra, S., Ansary, M. D., Hossen, S. M., ... & Humayun, A. I. (2022). Bengali common voice speech dataset for automatic speech recognition. arXiv preprint arXiv:2206.14053.

Alkhateeb, J. H. (2020). A machine learning approach for recognizing the Holy Quran reciter. International Journal of Advanced Computer Science and Applications, 11(7).

Amalia, S. (2017). Pengenalan Digit 0 Sampai 9 Menggunakan Ekstraksi Ciri MFCC dan Jaringan Syaraf Tiruan Backpropagation. Jurnal Teknik Elektro, 6(1), 1-8.

Amberkar, A., Awasarmol, P., Deshmukh, G., & Dave, P. (2018). Speech Recognition using Recurrent Neural Networks. In Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies (pp. 1-4). IEEE.

Araya, M., & Alehegn, M. (2022). Text to Speech Synthesizer for Tigrigna Linguistic using Concatenative Based approach with LSTM model. Indian Journal of Science and Technology, 15(1), 19-27.

Arsyad, M., & Rahman, S. A. (2022). Implementasi Metode Tahsin Untuk Meningkatkan Kemampuan Membaca Al-Qur’an di MI Darul Falah. Al-Furqan: Jurnal Agama, Sosial, dan Budaya, 1(3), 36-43.

Badrinath, S., & Balakrishnan, H. (2022). Automatic speech recognition for air traffic control communications. Transportation research record, 2676(1), 798-810.

Bhatt, S., Jain, A., & Dev, A. (2021). Feature extraction techniques with analysis of confusing words for speech recognition in the Hindi language. Wireless Personal Communications, 118, 3303-3333.

Chala, T. D., Guta, A. C., & Asebel, M. H. (2022). Design and Development of a Text-to-Speech Synthesizer for Afan Oromo. SN Computer Science, 3(5), 420.

Dubey, P., & Shah, B. (2022). Deep speech based end-to-end automated speech recognition (asr) for indian-english accents. arXiv preprint arXiv:2204.00977.

Eisenstein, J. (2018). Natural Language Processing. MIT Press.

El-Moneim, S. A., Nassar, M. A., Dessouky, M. I., Ismail, N. A., El-Fishawy, A. S., & Abd El-Samie, F. E. (2020). Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools and Applications, 79, 24013-24028.

Ertel, W. (2018). Introduction to artificial intelligence. Springer.

Hashemnia, S., Grasse, L., Soni, S., & Tata, M. S. (2021). Human EEG and recurrent neural networks exhibit common temporal dynamics during speech recognition. Frontiers in Systems Neuroscience, 15, 617605.

Imrana, Y., Xiang, Y., Ali, L., & Abdul-Rauf, Z. (2021). A bidirectional LSTM deep learning approach for intrusion detection. Expert Systems with Applications, 185, 115524.

Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685-695.

Joshy, A. A., & Rajan, R. (2022). Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 1147-1157.

Lee, Y. K., Park, K. R., & Lee, H. Y. (2021). Tax Judgment Analysis and Prediction using NLP and BiLSTM. Journal of Digital Convergence, 19(9).

Liu, X., Sahidullah, M., & Kinnunen, T. (2021). Learnable MFCCs for Speaker Verification. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE.

Maliana, E., Inayati, N. L., Rosyadi, R. I., & Chusniatun, C. (2022, July). Implementation Of Tahsin And Tahfidz Learning In Improving Reading Ability And Memorizing The Qur’an Skill. In International Conference on Islamic and Muhammadiyah Studies (ICIMS 2022) (pp. 298-304). Atlantis Press.

Mahmood, A., & Köse, U. (2021). Speech recognition based on Convolutional neural networks and MFCC algorithm. Advances in Artificial Intelligence Research (AAIR), 1(1), 6-12.

Mukiibi, J., Katumba, A., Nakatumba-Nabende, J., Hussein, A., & Meyer, J. (2022). The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition. arXiv preprint arXiv:2206.09790.

Nadikattu, R. R. (2018). Artificial Intelligence in IT. International Journal of Computer Trends and Technology ( IJCTT ), 29-32.

Nasib, A. U., Kabir, H., Ahmed, R., & Uddin, J. (2018, February). A real time speech to text conversion technique for bengali language. In 2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2) (pp. 1-4). IEEE.

Prasad, B. R., & Deepa, N. (2021). Classification of analyzed text in speech recognition using RNN-LSTM in comparison with convolutional neural network to improve precision for identification of keywords. REVISTA GEINTEC-GESTAO INOVACAO E TECNOLOGIAS, 11(2), 1097-1108.

Rajagede, R. A., & Hastuti, R. P. (2021). Al-Quran recitation verification for memorization test using Siamese LSTM network. Communications in Science and Technology, 6(1), 35-40.

Ridwan, T., & Majid, N. (2019, April). Development System for Recognize Tajweed in Qur’an using Automatic Speech Recognition. In Proceedings of the 1st International Conference on Science and Technology for an Internet of Things, 20 October 2018, Yogyakarta, Indonesia.

Salamun, S. K. (2022). Artificial Intelligence Automatic Speech Recognition (ASR) untuk pencarian potongan ayat Al-Qu’ran. Jurnal Komputer Terapan, 8(1), 36-45.

Shang, K., Chen, Z., Liu, Z., Song, L., Zheng, W., Yang, B., ... & Yin, L. (2021). Haze prediction model using deep recurrent neural network. Atmosphere, 12(12), 1625.

Shashidhar, R., Patilkulkarni, S., & Puneeth, S. B. (2022). Combining audio and visual speech recognition using LSTM and deep convolutional neural network. International Journal of Information Technology, 14(7), 3425-3436.

Siam, A. I., Elazm, A. A., El-Bahnasawy, N. A., El Banby, G. M., & Abd El-Samie, F. E. (2021). PPG-based human identification using Mel-frequency cepstral coefficients and neural networks. Multimedia Tools and Applications, 80(17), 26001-26019.

Trivedi, A., Pant, N., Shah, P., Sonik, S., & Agrawal, S. (2018). Speech to text and text to speech recognition systems-Areview. IOSR J. Comput. Eng, 20(2), 36-43.

Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics, 111, 1053-1070.

Veerasamy, V., Wahab, N. I. A., Othman, M. L., Padmanaban, S., Sekar, K., Ramachandran, R., ... & Islam, M. Z. (2021). LSTM recurrent neural network classifier for high impedance fault detection in solar PV integrated power system. IEEE access, 9, 32672-32687.

Zakariah, M., Khan, M. K., Tayan, O., & Salah, K. (2017). Digital Quran computing: review, classification, and trend analysis. Arabian Journal for Science and Engineering, 42, 3077-3102.

Zhu, X. (2022). RNN Language Processing Model-Driven Spoken Dialogue System Modeling Method. Computational Intelligence and Neuroscience, 2022.