scholarly journals Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review

Biology ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 453
Author(s):  
Petar Tonkovic ◽  
Slobodan Kalajdziski ◽  
Eftim Zdravevski ◽  
Petre Lameski ◽  
Roberto Corizzo ◽  
...  

Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.

2021 ◽  
Vol 12 ◽  
Author(s):  
Laura Judith Marcos-Zambrano ◽  
Kanita Karaduzovic-Hadziabdic ◽  
Tatjana Loncar Turukalo ◽  
Piotr Przymus ◽  
Vladimir Trajkovik ◽  
...  

The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.


10.2196/15347 ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. e15347
Author(s):  
Christopher Michael Homan ◽  
J Nicolas Schrading ◽  
Raymond W Ptucha ◽  
Catherine Cerulli ◽  
Cecilia Ovesdotter Alm

Background Social media is a rich, virtually untapped source of data on the dynamics of intimate partner violence, one that is both global in scale and intimate in detail. Objective The aim of this study is to use machine learning and other computational methods to analyze social media data for the reasons victims give for staying in or leaving abusive relationships. Methods Human annotation, part-of-speech tagging, and machine learning predictive models, including support vector machines, were used on a Twitter data set of 8767 #WhyIStayed and #WhyILeft tweets each. Results Our methods explored whether we can analyze micronarratives that include details about victims, abusers, and other stakeholders, the actions that constitute abuse, and how the stakeholders respond. Conclusions Our findings are consistent across various machine learning methods, which correspond to observations in the clinical literature, and affirm the relevance of natural language processing and machine learning for exploring issues of societal importance in social media.


2021 ◽  
Vol 28 ◽  
Author(s):  
Yuyang Xue ◽  
Xiucai Ye ◽  
Lesong Wei ◽  
Xin Zhang ◽  
Tetsuya Sakurai ◽  
...  

: With its superior performance, the Transformer model, which is based on the 'Encoder-Decoder' paradigm, has become the mainstream in natural language processing. On the other hand, bioinformatics has embraced machine learning and made great progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are one kind of permeable protein that is convenient as a kind of 'postman' in drug penetration tasks. However, a small number of CPPs have been discovered by research, let alone practical applications in drug permeability. Therefore, correctly identifying the CPPs has opened up a new way to take macromolecules into cells without other potentially harmful materials in the drug. Most of the previous work only uses trivial machine learning techniques and hand-crafted features to construct a simple classifier. In CPPFormer, we learn from the idea of implementing the attention structure of Transformer, rebuilding the network based on the characteristics of CPPs according to its short length, and using an automatic feature extractor with a few manual engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical result has shown that our proposed deep model-based method has achieved the best performance of 92.16% accuracy in the CPP924 dataset and has passed various index tests.


2020 ◽  
pp. 009385482096975
Author(s):  
Mehdi Ghasemi ◽  
Daniel Anvari ◽  
Mahshid Atapour ◽  
J. Stephen wormith ◽  
Keira C. Stockdale ◽  
...  

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.


2019 ◽  
Vol 46 (1) ◽  
pp. 101-117 ◽  
Author(s):  
Mohammad Ehsan Basiri ◽  
Arman Kabiri

Opinion mining is a subfield of data mining and natural language processing that concerns with extracting users’ opinion and attitude towards products or services from their comments on the Web. Persian opinion mining, in contrast to its counterpart in English, is a totally new field of study and hence, it has not received the attention it deserves. Existing methods for opinion mining in the Persian language may be classified into machine learning– and lexicon-based approaches. These methods have been proposed and successfully used for polarity-detection problem. However, when they should be used for more complex tasks like rating prediction, their results are not desirable. In this study, first an exhaustive investigation of machine learning– and lexicon-based methods is performed. Then, a new hybrid method is proposed for rating-prediction problem in the Persian language. Finally, the effect of machine learning component, feature-selection method, normalisation method and combination level are investigated. The experimental results on a large data set containing 16,000 Persian customers’ review show that this proposed system achieves higher performance in comparison to Naïve Bayes algorithm and a pure lexicon-based method. Moreover, results demonstrate that this proposed method may also be successfully used for polarity detection.


AI Magazine ◽  
2012 ◽  
Vol 33 (1) ◽  
pp. 11-24 ◽  
Author(s):  
Carla E. Brodley ◽  
Umaa Rebbapragada ◽  
Kevin Small ◽  
Byron Wallace

Machine learning research is often conducted in vitro, divorced from motivating practical applications. A researcher might develop a new method for the general task of classification, then assess its utility by comparing its performance (such as accuracy or AUC) to that of existing classification models on publicly available datasets. In terms of advancing machine learning as an academic discipline, this approach has thus far proven quite fruitful. However, it is our view that the most interesting open problems in machine learning are those that arise during its application to real-world problems. We illustrate this point by reviewing two of our interdisciplinary collaborations, both of which have posed unique machine learning problems, providing fertile ground for novel research.


in modeling of complex systems, manual creation and maintenance of the appropriate behavior is found to be the key problem. Behavior modeling using machine learning has found successful in modeling and simulation. This paper presents artificial neural network (ANN) modeling of transmission line carrying frequency varying signal using machine learning. This work uses proper orthogonal decomposition (POD) based reduced order modeling. In this proposed work, snapshot sets of complex mathematical model of nonlinear transmission line and also linear model are obtained at different time interval. These snapshot sets are arranged in matrix form separately for nonlinear and linear models. POD method is applied on both the matrices separately. This reduces the order of the matrix which is used as input and output data set for neural network training through machine learning technique. Trained neural network model has been verified using different untrained data set. The proposed algorithm determines the dimension of the interpolation space prompting a considerable decrease in the computational expense. The proposed algorithm doesn't force any imperatives on the topology of the appropriate circuit or kind of the nonlinear segments and hence relevant to general nonlinear systems.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 203
Author(s):  
Maha M. Alshammari ◽  
Afnan Almuhanna ◽  
Jamal Alhiyafi

A tumor is an abnormal tissue classified as either benign or malignant. A breast tumor is one of the most common tumors in women. Radiologists use mammograms to identify a breast tumor and classify it, which is a time-consuming process and prone to error due to the complexity of the tumor. In this study, we applied machine learning-based techniques to assist the radiologist in reading mammogram images and classifying the tumor in a very reasonable time interval. We extracted several features from the region of interest in the mammogram, which the radiologist manually annotated. These features are incorporated into a classification engine to train and build the proposed structure classification models. We used a dataset that was not previously seen in the model to evaluate the accuracy of the proposed system following the standard model evaluation schemes. Accordingly, this study found that various factors could affect the performance, which we avoided after experimenting all the possible ways. This study finally recommends using the optimized Support Vector Machine or Naïve Bayes, which produced 100% accuracy after integrating the feature selection and hyper-parameter optimization schemes.


2019 ◽  
Vol 8 (4) ◽  
pp. 590
Author(s):  
Chhayarani Ram Kinkar ◽  
Yogendra Kumar Jain

Natural language processing is a very active area of research and development, there is not a single agreed upon a method that would satisfy everyone for the use of natural language to operate electronic devices or other practical applications. But there are some aspects used from many years in the formulation and solution of computational problem arising in natural language processing. This paper describes a model in which numerical values are assigned to word of natural language speech data set to convert the information present in natural language speech data set into an intermediate numeric form as a structured data set. The intermediated numerical values of each word will be used for generation of machine code which will be easily understand by electronic devices to draw inferences from data set. The designed model is useful for a number of practical applications and very simple to implement.  


2020 ◽  
Author(s):  
Monalisha Ghosh ◽  
Goutam Sanyal

Abstract ­­­­­­­­­­­­­­­­­­­­­­­­­­­ Sentiment Analysis has recently been considered as the most active research field in the natural language processing (NLP) domain. Deep Learning is a subset of the large family of Machine Learning and becoming a growing trend due to its automatic learning capability with impressive consequences across different NLP tasks. Hence, a fusion-based Machine Learning framework has been attempted by merging the Traditional Machine Learning method with Deep Learning techniques to tackle the challenge of sentiment prediction for a massive amount of unstructured review dataset. The proposed architecture aims to utilize the Convolutional Neural Network (CNN) with a backpropagation algorithm to extract embedded feature vectors from the top hidden layer. Thereafter, these vectors augmented to an optimized feature set generated from binary particle swarm optimization (BPSO) method. Finally, a traditional SVM classifier is trained with these extended features set to determine the optimal hyper-plane for separating two classes of review datasets. The evaluation of this research work has been carried out on two benchmark movie review datasets IMDB, SST2. Experimental results with comparative studies based on performance accuracy and F-score value are reported to highlight the benefits of the developed frameworks.


Sign in / Sign up

Export Citation Format

Share Document