scholarly journals OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Fabio R. Cerqueira ◽  
Ana Tereza Ribeiro Vasconcelos

Abstract Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


Materials ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1089
Author(s):  
Sung-Hee Kim ◽  
Chanyoung Jeong

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.


2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


Author(s):  
M. M. Ata ◽  
K. M. Elgamily ◽  
M. A. Mohamed

The presented paper proposes an algorithm for palmprint recognition using seven different machine learning algorithms. First of all, we have proposed a region of interest (ROI) extraction methodology which is a two key points technique. Secondly, we have performed some image enhancement techniques such as edge detection and morphological operations in order to make the ROI image more suitable for the Hough transform. In addition, we have applied the Hough transform in order to extract all the possible principle lines on the ROI images. We have extracted the most salient morphological features of those lines; slope and length. Furthermore, we have applied the invariant moments algorithm in order to produce 7 appropriate hues of interest. Finally, after performing a complete hybrid feature vectors, we have applied different machine learning algorithms in order to recognize palmprints effectively. Recognition accuracy have been tested by calculating precision, sensitivity, specificity, accuracy, dice, Jaccard coefficients, correlation coefficients, and training time. Seven different supervised machine learning algorithms have been implemented and utilized. The effect of forming the proposed hybrid feature vectors between Hough transform and Invariant moment have been utilized and tested. Experimental results show that the feed forward neural network with back propagation has achieved about 99.99% recognition accuracy among all tested machine learning techniques.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Suzanna Schmeelk

This research examines industry-based dissertation research in a doctoral computing program through the lens of machine learning algorithms to understand topics explored by senior and experienced full-time working professionals (EFWPs).  Our research categorizes dissertation by both their abstracts and by their full-text using the Graplab Create library from Apple’s Turi. We also compare the dissertation categorizations using IBM’s Watson Discovery deep machine learning tool.  Our research provides perspectives on the practicality of the manual classification of technical documents; and, it provides insights into the: (1) categories of academic work created by EFWPs in a Computing doctoral program, (2) viability of automated categorization versus human abstraction, and (3) differences in categorization algorithms.


2021 ◽  
Vol 297 ◽  
pp. 01032
Author(s):  
Harish Kumar ◽  
Anshal Prasad ◽  
Ninad Rane ◽  
Nilay Tamane ◽  
Anjali Yeole

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.


2020 ◽  
Vol 7 (10) ◽  
pp. 380-389
Author(s):  
Asogwa D.C ◽  
Anigbogu S.O ◽  
Anigbogu G.N ◽  
Efozia F.N

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.


Author(s):  
Rashida Ali ◽  
Ibrahim Rampurawala ◽  
Mayuri Wandhe ◽  
Ruchika Shrikhande ◽  
Arpita Bhatkar

Internet provides a medium to connect with individuals of similar or different interests creating a hub. Since a huge hub participates on these platforms, the user can receive a high volume of messages from different individuals creating a chaos and unwanted messages. These messages sometimes contain a true information and sometimes false, which leads to a state of confusion in the minds of the users and leads to first step towards spam messaging. Spam messages means an irrelevant and unsolicited message sent by a known/unknown user which may lead to a sense of insecurity among users. In this paper, the different machine learning algorithms were trained and tested with natural language processing (NLP) to classify whether the messages are spam or ham.


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


Sign in / Sign up

Export Citation Format

Share Document