Modified Hard Voting Classifier Implementation on MEFV Gene Variants Increases in Silico Tool Performance: A Novel Approach for Small Sample Size


Creative Commons License

Alay M. T., Demir İ., Kirişci M.

Zeki sistemler teori ve uygulamaları dergisi (Online), cilt.8, sa.1, ss.35-46, 2025 (TRDizin)

Özet

Objective: There are a limited number of pathogenic variants known in the MEFV gene. In silico tools fail to classify many MEFV gene variants. Therefore, it is essential to implement novel approaches. Our goal is to develop a new strategy to solve the even number classification problem while improving MEFV gene variant prediction accuracy using small datasets.

Material - methods: First, we determined the optimal number of computational tools for the model. We then applied eight distinct ML algorithms on the training dataset containing MEFV gene variants using the determined tools. We initiated the application of modified hard voting machine learning algorithms, using a training and validation dataset. Subsequently, we implemented a comparative analysis between the prediction results and existing algorithms and studies. Finally, we evaluated the gene and protein level ascertainment to identify hotspot regions.

Results: The ensemble classifier scored an average ROCAUC of 88%. The modified hard voting method correctly classified all known variants with 82% accuracy, outperforming both the soft voting (75%) and hard voting (70%) methods. The results showed that the prevalence of LP variants was approximately 2.5 times higher in domains compared to LB variants(χ2: 13.574, p < 0.001, OR: 2.509 [1.532-4.132]).

Conclusion: Considering the limited understanding of the clinical implications associated with MEFV gene mutations, employing a modified hard voting classifier approach may improve the classification accuracy of computational tools.