Zeki sistemler teori ve uygulamaları dergisi (Online), cilt.8, sa.1, ss.35-46, 2025 (TRDizin)
Objective:
There are a limited
number of pathogenic variants known in the MEFV gene. In silico tools fail to
classify many MEFV gene variants. Therefore, it is essential to implement novel
approaches. Our goal is to develop a new strategy to solve the even number
classification problem while improving MEFV gene variant prediction accuracy
using small datasets.
Material -
methods: First, we
determined the optimal number of computational tools for the model. We then
applied eight distinct ML algorithms on the training dataset containing MEFV
gene variants using the determined tools. We initiated the application of
modified hard voting machine learning algorithms, using a training and
validation dataset. Subsequently, we implemented a comparative analysis between
the prediction results and existing algorithms and studies. Finally, we
evaluated the gene and protein level ascertainment to identify hotspot regions.
Results: The ensemble classifier scored an
average ROCAUC of 88%. The modified hard voting method correctly classified all
known variants with 82% accuracy, outperforming both the soft voting (75%) and
hard voting (70%) methods. The results showed that the prevalence of LP
variants was approximately 2.5 times higher in domains compared to LB
variants(χ2: 13.574, p < 0.001, OR: 2.509 [1.532-4.132]).
Conclusion: Considering the limited understanding of the clinical implications associated with MEFV gene mutations, employing a modified hard voting classifier approach may improve the classification accuracy of computational tools.