Outdoor Plant Segmentation With Deep Learning for High-Throughput Field Phenotyping on a Diverse Wheat Dataset

Robust and automated segmentation of leaves and other backgrounds is a core prerequisite of most approaches in high-throughput field phenotyping. So far, the possibilities of deep learning approaches for this purpose have not been explored adequately, partly due to a lack of publicly available, appropriate datasets. This study presents a workflow based on DeepLab v3+ and on a diverse annotated dataset of 190 RGB (350 x 350 pixels) images. Images of winter wheat plants of 76 different genotypes and developmental stages have been acquired throughout multiple years at high resolution in outdoor conditions using nadir view, encompassing a wide range of imaging conditions. Inconsistencies of human annotators in complex images have been quantified, and metadata information of camera settings has been included. The proposed approach achieves an intersection over union (IoU) of 0.77 and 0.90 for plants and soil, respectively. This outperforms the benchmarked machine learning methods which use Support Vector Classifier and/or Random Forrest. The results show that a small but carefully chosen and annotated set of images can provide a good basis for a powerful segmentation pipeline. Compared to earlier methods based on machine learning, the proposed method achieves better performance on the selected dataset in spite of using a deep learning approach with limited data. Increasing the amount of publicly available data with high human agreement on annotations and further development of deep neural network architectures will provide high potential for robust field-based plant segmentation in the near future. This, in turn, will be a cornerstone of data-driven improvement in crop breeding and agricultural practices of global benefit.

Download Full-text

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

Sentiment Analysis of Lithuanian Texts Using Traditional and Deep Learning Approaches

Computers ◽

10.3390/computers8010004 ◽

2019 ◽

Vol 8 (1) ◽

pp. 4 ◽

Cited By ~ 4

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Robertas Damaševičius ◽

Marcin Woźniak

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Full Dataset ◽

Learning Techniques ◽

Long Short Term Memory

We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.

Download Full-text

PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

eLife ◽

10.7554/elife.70576 ◽

2021 ◽

Vol 10 ◽

Author(s):

Daniel Griffith ◽

Alex S Holehouse

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

High Throughput ◽

Recurrent Neural Network ◽

Transcriptional Activation ◽

Network Architecture ◽

Learning Approaches ◽

Large Protein ◽

Protein Datasets

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Download Full-text

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

Applied Sciences ◽

10.3390/app11188438 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8438

Author(s):

Muhammad Mujahid ◽

Ernesto Lee ◽

Furqan Rustam ◽

Patrick Bernard Washington ◽

Saleem Ullah ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Online Education ◽

Sentiment Analysis ◽

Topic Modeling ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

E Learning ◽

Machine Learning Models

Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 tweets about e-learning. Machine learning and deep learning approaches have shown their suitability, capability, and potential for image processing, object detection, and natural language processing tasks and text analysis is no exception. Machine learning approaches have been largely used both for annotation and text and sentiment analysis. Keeping in view the adequacy and efficacy of machine learning models, this study adopts TextBlob, VADER (Valence Aware Dictionary for Sentiment Reasoning), and SentiWordNet to analyze the polarity and subjectivity score of tweets’ text. Furthermore, bearing in mind the fact that machine learning models display high classification accuracy, various machine learning models have been used for sentiment classification. Two feature extraction techniques, TF-IDF (Term Frequency-Inverse Document Frequency) and BoW (Bag of Words) have been used to effectively build and evaluate the models. All the models have been evaluated in terms of various important performance metrics such as accuracy, precision, recall, and F1 score. The results reveal that the random forest and support vector machine classifier achieve the highest accuracy of 0.95 when used with Bow features. Performance comparison is carried out for results of TextBlob, VADER, and SentiWordNet, as well as classification results of machine learning models and deep learning models such as CNN (Convolutional Neural Network), LSTM (Long Short Term Memory), CNN-LSTM, and Bi-LSTM (Bidirectional-LSTM). Additionally, topic modeling is performed to find the problems associated with e-learning which indicates that uncertainty of campus opening date, children’s disabilities to grasp online education, and lagging efficient networks for online education are the top three problems.

Download Full-text

Evaluation of Machine Learning Approaches for Automated Diagnosis of COVID-19 using X-Ray images (Preprint)

10.2196/preprints.18947 ◽

2020 ◽

Author(s):

Mazin Mohammed ◽

Karrar Hameed Abdulkareem ◽

Mashael S. Maashi ◽

Salama A. Mostafa A. Mostafa ◽

Abdullah Baz ◽

...

Keyword(s):

Machine Learning ◽

Computational Method ◽

Learning Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

X Ray ◽

Wide Range ◽

Artificial Neural Network Ann

BACKGROUND In most recent times, global concern has been caused by a coronavirus (COVID19), which is considered a global health threat due to its rapid spread across the globe. Machine learning (ML) is a computational method that can be used to automatically learn from experience and improve the accuracy of predictions. OBJECTIVE In this study, the use of machine learning has been applied to Coronavirus dataset of 50 X-ray images to enable the development of directions and detection modalities with risk causes.The dataset contains a wide range of samples of COVID-19 cases alongside SARS, MERS, and ARDS. The experiment was carried out using a total of 50 X-ray images, out of which 25 images were that of positive COVIDE-19 cases, while the other 25 were normal cases. METHODS An orange tool has been used for data manipulation. To be able to classify patients as carriers of Coronavirus and non-Coronavirus carriers, this tool has been employed in developing and analysing seven types of predictive models. Models such as , artificial neural network (ANN), support vector machine (SVM), linear kernel and radial basis function (RBF), k-nearest neighbour (k-NN), Decision Tree (DT), and CN2 rule inducer were used in this study.Furthermore, the standard InceptionV3 model has been used for feature extraction target. RESULTS The various machine learning techniques that have been trained on coronavirus disease 2019 (COVID-19) dataset with improved ML techniques parameters. The data set was divided into two parts, which are training and testing. The model was trained using 70% of the dataset, while the remaining 30% was used to test the model. The results show that the improved SVM achieved a F1 of 97% and an accuracy of 98%. CONCLUSIONS :. In this study, seven models have been developed to aid the detection of coronavirus. In such cases, the learning performance can be improved through knowledge transfer, whereby time-consuming data labelling efforts are not required.the evaluations of all the models are done in terms of different parameters. it can be concluded that all the models performed well, but the SVM demonstrated the best result for accuracy metric. Future work will compare classical approaches with deep learning ones and try to obtain better results. CLINICALTRIAL None

Download Full-text

A study of deep learning approaches for medication and adverse drug event extraction from clinical text

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz063 ◽

2019 ◽

Vol 27 (1) ◽

pp. 13-21 ◽

Cited By ~ 11

Author(s):

Qiang Wei ◽

Zongcheng Ji ◽

Zhiheng Li ◽

Jingcheng Du ◽

Jingqi Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithms ◽

Joint Model ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Superior Performance ◽

Drug Event ◽

Support Vector ◽

Learning Approaches

AbstractObjectiveThis article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.Materials and MethodsThe clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.ResultsOur best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.ConclusionIn this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.

Download Full-text

Machine learning with sklearn

Fundamentals of Machine Learning ◽

10.1093/oso/9780198828044.003.0003 ◽

2019 ◽

pp. 38-65

Author(s):

Thomas P. Trappenberg

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Probabilistic Reasoning ◽

Learning Algorithms ◽

General Setting ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Support Vector ◽

Support Vector Classifier ◽

Evaluation Techniques

This chapter’s goal is to show how to apply machine learning algorithms in a general setting using some classic methods. In particular, it demonstrates how to apply three important machine learning algorithms, a support vector classifier (SVC), a random forest classifier (RFC), and a multilayer perceptron (MLP). While many of the methods studied later go beyond these now classic methods, this does not mean that these methods are obsolete. Also, the algorithms discussed here provide some form of baseline to discuss advanced methods like probabilistic reasoning and deep learning. The aim here is to demonstrate that applying machine learning methods based on machine learning libraries is not very difficult. It offers an opportunity to discuss evaluation techniques that are very important in practice.

Download Full-text

PARROT: a flexible recurrent neural network framework for analysis of large protein datasets

10.1101/2021.05.21.445045 ◽

2021 ◽

Author(s):

Daniel Griffith ◽

Alex S Holehouse

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

High Throughput ◽

Recurrent Neural Network ◽

Transcriptional Activation ◽

Network Architecture ◽

Learning Approaches ◽

Large Protein ◽

Protein Datasets

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex non-linear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid-beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Download Full-text

Detection of Ovarian Tumor Using Machine Learning Approaches A Review

10.46532/978-81-950008-1-4_103 ◽

2020 ◽

pp. 471-476

Author(s):

Gitanjali Wadhwa ◽

Mansi Mathur

Keyword(s):

Machine Learning ◽

Ovarian Cancer ◽

Deep Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Female Sex Hormones

The important part of female reproductive system is ovaries. The importance of these tiny glands is derived from the production of female sex hormones and female gametes. The place of these ductless almond shaped tiny glandular organs is on just opposite sides of uterus attached with ovarian ligament. There are several reasons due to which ovarian cancer can arise but it can be classified by using different number of techniques. Early prediction of ovarian cancer will decrease its progress rate and may possibly save countless lives. CAD systems (Computer-aided diagnosis) is a noninvasive routine for finding ovarian cancer in its initial stages of cancer which can keep away patients’ anxiety and unnecessary biopsy. This review paper states us about how we can use different techniques to classify the ovarian cancer tumor. In this survey effort we have also deliberate about the comparison of different machine learning algorithms like K-Nearest Neighbor, Support Vector Machine and deep learning techniques used in classification process of ovarian cancer. Later comparing the different techniques for this type of cancer detection, it gives the impression that Deep Learning Technique has provided good results and come out with good accuracy and other performance metrics.

Download Full-text