Implementation of RNN-LSTM with L1 regularization for predicting labels from chimpanzee DNA sequences using pseudo-labeling

Sugiyarto Surono, Goh Khang Wen, Arif Rahman, Lalu M. Irham, Sintia Afriyani

Abstract

Chimpanzee genome research plays a crucial role in understanding evolution, health, and biological functions. However, incomplete labeling of DNA sequence data presents a challenge for accurate genomic classification. This study aims to improve chimpanzee DNA sequence classification by addressing label scarcity and data imbalance through a deep learning approach. A Recurrent Neural Network Long Short-Term Memory (RNN-LSTM) model with L1 Regularization and pseudo-labeling is employed to enhance classification performance. The workflow includes numerical encoding of DNA sequences, pseudo-labeling to augment training data, and model training using Stochastic Gradient Descent (SGD) optimization. Performance evaluation is conducted using classification accuracy and AUC metrics. Results show that the proposed approach achieves high classification accuracy, with an AUC ranging from 0.94 to 0.99, significantly improving the handling of imbalanced datasets. The integration of pseudo-labeling effectively leverages unlabeled DNA sequences, leading to a more robust genomic classification model. These findings highlight the potential of combining RNN-LSTM with L1 Regularization and pseudo-labeling to address incomplete labeling in genomic datasets. The study advances genomic classification techniques and supports Goal 3: Good Health and Well-being of the Sustainable Development Goals (SDGs) by enhancing DNA sequence classification accuracy, facilitating early disease detection, precision medicine, and evolutionary studies.

Authors

Sugiyarto Surono
Sugiyarto@math.uad.ac.id (Primary Contact)
Goh Khang Wen
Arif Rahman
Lalu M. Irham
Sintia Afriyani
Surono, S. ., Wen, G. K. ., Rahman, A. ., Irham, L. M. ., & Afriyani, S. . (2025). Implementation of RNN-LSTM with L1 regularization for predicting labels from chimpanzee DNA sequences using pseudo-labeling. International Journal of Innovative Research and Scientific Studies, 8(3), 2774–2786. https://doi.org/10.53894/ijirss.v8i3.7083

Article Details

No Related Submission Found