Paper Title
Integration and Identification of Mutation Patterns using Machine Learning for Precise Cancer Classification
Abstract
Accurate identification and categorization of cancer using genomic sequences is a significant challenge in bioinformatics. A basic method for finding mutations and sequence patterns in genomic data is string matching. In this paper, we introduce a new approach to multi-class cancer classification that integrates machine learning and mutation detection through string matching. The Boyer-Moore algorithm is used to compare the gene sequences of the cancerous and normal cells to identify efficiently the mutation regions. The four cancers that have been considered in our research are uterine, pancreatic, stomach, and cervical cancers. The sequences go through k-mer frequency features after mutation detection. Various machine learning models, including ensemble methods and stacking classifiers, are trained using the features to predict the type of cancer with accuracy.
Keywords - Bioinformatics, DNA sequence analysis, mutation detection, string matching, k-mer frequency, machine learning, ensemble methods.