Machine Learning and Deep Learning challenges for building 2′O site prediction

Mapping Intimacies ◽

10.1101/2020.05.10.087189 ◽

2020 ◽

Author(s):

Milad Mostavi ◽

Yufei Huang

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Predictive Model ◽

Sequence Length ◽

Learning Models ◽

Rna Modifications ◽

Operating Characteristics ◽

Rna Detection ◽

Learning Challenges ◽

The Impact

Abstract2′-O-methylation (2′O) is one of the abundant post-transcriptional RNA modifications which can be found in all types of RNA. Detection and functional analysis of 2′O methylation have become challenging problems for biologists ever since its discovery. This paper addresses computational challenges for building Machine Learning and Deep Learning models for predicting 2′O sites. In particular, the impact of sequence length containing 2′O site, embedding method and the type of predictive model are each investigated separately. 30 different predictive models are built and each showed the impact of the mentioned parameters. The area under the precision-recall and receiving operating characteristics curves are utilized to test imbalanced case scenarios in the real world. By comparing the performance of these models, it is shown that embedding methods are crucial for Machine Learning models. However, they do not improve the performance of Deep Learning models. Furthermore, the best predictive model was further investigated to extract significant nucleotides surrounding 2′O sites. Interestingly, based on the significant score matrix achieved by all 2′O samples, it is depicted that model pays the highest attention at the location that the dominant 2′O motifs exist. Dataset and all of the codes are available at https://github.com/MMostavi/2_O_Me_sitePred

Download Full-text

Implementing clinical decision support for oncology advanced care planning: A systems engineering framework to optimize the usability and utility of a machine learning predictive model in clinical practice.

Journal of Clinical Oncology ◽

10.1200/jco.2020.39.28_suppl.330 ◽

2021 ◽

Vol 39 (28_suppl) ◽

pp. 330-330

Author(s):

Teja Ganta ◽

Stephanie Lehrman ◽

Rachel Pappalardo ◽

Madalene Crow ◽

Meagan Will ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Predictive Model ◽

Systems Engineering ◽

Care Planning ◽

Learning Models ◽

Predictive Tool ◽

Risk Of Death ◽

The Impact ◽

Machine Learning Models

330 Background: Machine learning models are well-positioned to transform cancer care delivery by providing oncologists with more accurate or accessible information to augment clinical decisions. Many machine learning projects, however, focus on model accuracy without considering the impact of using the model in real-world settings and rarely carry forward to clinical implementation. We present a human-centered systems engineering approach to address clinical problems with workflow interventions utilizing machine learning algorithms. Methods: We aimed to develop a mortality predictive tool, using a Random Forest algorithm, to identify oncology patients at high risk of death within 30 days to move advance care planning (ACP) discussions earlier in the illness trajectory. First, a project sponsor defined the clinical need and requirements of an intervention. The data scientists developed the predictive algorithm using data available in the electronic health record (EHR). A multidisciplinary workgroup was assembled including oncology physicians, advanced practice providers, nurses, social workers, chaplain, clinical informaticists, and data scientists. Meeting bi-monthly, the group utilized human-centered design (HCD) methods to understand clinical workflows and identify points of intervention. The workgroup completed a workflow redesign workshop, a 90-minute facilitated group discussion, to integrate the model in a future state workflow. An EHR (Epic) analyst built the user interface to support the intervention per the group’s requirements. The workflow was piloted in thoracic oncology and bone marrow transplant with plans to scale to other cancer clinics. Results: Our predictive model performance on test data was acceptable (sensitivity 75%, specificity 75%, F-1 score 0.71, AUC 0.82). The workgroup identified a “quality of life coordinator” who: reviews an EHR report of patients scheduled in the upcoming 7 days who have a high risk of 30-day mortality; works with the oncology team to determine ACP clinical appropriateness; documents the need for ACP; identifies potential referrals to supportive oncology, social work, or chaplain; and coordinates the oncology appointment. The oncologist receives a reminder on the day of the patient’s scheduled visit. Conclusions: This workgroup is a viable approach that can be replicated at institutions to address clinical needs and realize the full potential of machine learning models in healthcare. The next steps for this project are to address end-user feedback from the pilot, expand the intervention to other cancer disease groups, and track clinical metrics.

Download Full-text

Training confounder-free deep learning models for medical applications

Nature Communications ◽

10.1038/s41467-020-19784-9 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Qingyu Zhao ◽

Ehsan Adeli ◽

Kilian M. Pohl

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Bone Age ◽

Magnetic Resonance Images ◽

Data Sets ◽

Large Set ◽

Learning Models ◽

End To End ◽

The Impact ◽

The Relationship

AbstractThe presence of confounding effects (or biases) is one of the most critical challenges in using deep learning to advance discovery in medical imaging studies. Confounders affect the relationship between input data (e.g., brain MRIs) and output variables (e.g., diagnosis). Improper modeling of those relationships often results in spurious and biased associations. Traditional machine learning and statistical models minimize the impact of confounders by, for example, matching data sets, stratifying data, or residualizing imaging measurements. Alternative strategies are needed for state-of-the-art deep learning models that use end-to-end training to automatically extract informative features from large set of images. In this article, we introduce an end-to-end approach for deriving features invariant to confounding factors while accounting for intrinsic correlations between the confounder(s) and prediction outcome. The method does so by exploiting concepts from traditional statistical methods and recent fair machine learning schemes. We evaluate the method on predicting the diagnosis of HIV solely from Magnetic Resonance Images (MRIs), identifying morphological sex differences in adolescence from those of the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA), and determining the bone age from X-ray images of children. The results show that our method can accurately predict while reducing biases associated with confounders. The code is available at https://github.com/qingyuzhao/br-net.

Download Full-text

The Future of PHM Could be Tiny under Cloud: Exploring Potential Application Patterns of TinyML in PHM Scenarios

Annual Conference of the PHM Society ◽

10.36001/phmconf.2021.v13i1.3054 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Xingyu Zhou ◽

Zhuangwei Kang ◽

Robert Canady ◽

Shunxing Bao ◽

Daniel Allen Balasubramanian ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Remaining Useful Life ◽

Data Driven ◽

Learning Models ◽

Level Data ◽

Data Source ◽

Single Data ◽

The Impact ◽

Machine Learning Models

Deep learning has shown impressive performance acrosshealth management and prognostics applications. Nowadays, an emerging trend of machine learning deployment on resource constraint hardware devices like micro-controllers(MCU) has aroused much attention. Given the distributed andresource constraint nature of many PHM applications, using tiny machine learning models close to data source sensors for on-device inferences would be beneficial to save both time andadditional hardware resources. Even though there has beenpast works that bring TinyML on MCUs for some PHM ap-plications, they are mainly targeting single data source usage without higher-level data incorporation with cloud computing.We study the impact of potential cooperation patterns betweenTinyML on edge and more powerful computation resources oncloud and how this would make an impact on the application patterns in data-driven prognostics. We introduce potential ap-plications where sensor readings are utilized for system health status prediction including status classification and remaining useful life regression. We find that MCUs and cloud com-puting can be adaptive to different kinds of machine learning models and combined in flexible ways for diverse requirement.Our work also shows limitations of current MCU-based deep learning in data-driven prognostics And we hope our work can

Download Full-text

Modeling the Impact of Covid-19 on the Farm Produce Availability and Pricing in India

Interdisciplinary Journal of Information Knowledge and Management ◽

10.28945/4897 ◽

2022 ◽

Vol 17 ◽

pp. 035-065

Author(s):

Niharika Prasanna Kumar

Keyword(s):

Machine Learning ◽

Developing Countries ◽

Deep Learning ◽

Research Work ◽

Gradient Boosting ◽

Learning Models ◽

Recall Accuracy ◽

Agricultural Produce ◽

Farm Produce ◽

The Impact

Aim/Purpose: This paper aims to analyze the availability and pricing of perishable farm produce before and during the lockdown restrictions imposed due to Covid-19. This paper also proposes machine learning and deep learning models to help the farmers decide on an appropriate market to sell their farm produce and get a fair price for their product. Background: Developing countries like India have regulated agricultural markets governed by country-specific protective laws like the Essential Commodities Act and the Agricultural Produce Market Committee (APMC) Act. These regulations restrict the sale of agricultural produce to a predefined set of local markets. Covid-19 pandemic led to a lockdown during the first half of 2020 which resulted in supply disruption and demand-supply mismatch of agricultural commodities at these local markets. These demand-supply dynamics led to disruptions in the pricing of the farm produce leading to a lower price realization for farmers. Hence it is essential to analyze the impact of this disruption on the pricing of farm produce at a granular level. Moreover, the farmers need a tool that guides them with the most suitable market/city/town to sell their farm produce to get a fair price. Methodology: One hundred and fifty thousand samples from the agricultural dataset, released by the Government of India, were used to perform statistical analysis and identify the supply disruptions as well as price disruptions of perishable agricultural produce. In addition, more than seventeen thousand samples were used to implement and train machine learning and deep learning models that can predict and guide the farmers about the appropriate market to sell their farm produce. In essence, the paper uses descriptive analytics to analyze the impact of COVID-19 on agricultural produce pricing. The paper explores the usage of prescriptive analytics to recommend an appropriate market to sell agricultural produce. Contribution: Five machine learning models based on Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Random Forest, and Gradient Boosting, and three deep learning models based on Artificial Neural Networks were implemented. The performance of these models was compared using metrics like Precision, Recall, Accuracy, and F1-Score. Findings: Among the five classification models, the Gradient Boosting classifier was the optimal classifier that achieved precision, recall, accuracy, and F1 score of 99%. Out of the three deep learning models, the Adam optimizer-based deep neural network achieved precision, recall, accuracy, and F1 score of 99%. Recommendations for Practitioners: Gradient boosting technique and Adam-based deep learning model should be the preferred choice for analyzing agricultural pricing-related problems. Recommendation for Researchers: Ensemble learning techniques like Random Forest and Gradient boosting perform better than non-Ensemble classification techniques. Hyperparameter tuning is an essential step in developing these models and it improves the performance of the model. Impact on Society: Statistical analysis of the data revealed the true nature of demand and supply and price disruption. This analysis helps to assess the revenue impact borne by the farmers due to Covid-19. The machine learning and deep learning models help the farmers to get a better price for their crops. Though the da-taset used in this paper is related to India, the outcome of this research work applies to many developing countries that have similar regulated markets. Hence farmers from developing countries across the world can benefit from the outcome of this research work. Future Research: The machine learning and deep learning models were implemented and tested for markets in and around Bangalore. The model can be expanded to cover other markets within India.

Download Full-text

Marine Data Prediction: An Evaluation of Machine Learning, Deep Learning, and Statistical Predictive Models

Computational Intelligence and Neuroscience ◽

10.1155/2021/8551167 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ahmed Ali ◽

Ahmed Fathalla ◽

Ahmad Salah ◽

Mahmoud Bekhit ◽

Esraa Eldesouky

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Predictive Model ◽

Predictive Models ◽

Marine Biodiversity ◽

Learning Models ◽

Real Dataset ◽

Marine Data ◽

Predictive Approaches ◽

Machine Learning Models

Nowadays, ocean observation technology continues to progress, resulting in a huge increase in marine data volume and dimensionality. This volume of data provides a golden opportunity to train predictive models, as the more the data is, the better the predictive model is. Predicting marine data such as sea surface temperature (SST) and Significant Wave Height (SWH) is a vital task in a variety of disciplines, including marine activities, deep-sea, and marine biodiversity monitoring. The literature has efforts to forecast such marine data; these efforts can be classified into three classes: machine learning, deep learning, and statistical predictive models. To the best of the authors’ knowledge, no study compared the performance of these three approaches on a real dataset. This paper focuses on the prediction of two critical marine features: the SST and SWH. In this work, we proposed implementing statistical, deep learning, and machine learning models for predicting the SST and SWH on a real dataset obtained from the Korea Hydrographic and Oceanographic Agency. Then, we proposed comparing these three predictive approaches on four different evaluation metrics. Experimental results have revealed that the deep learning model slightly outperformed the machine learning models for overall performance, and both of these approaches greatly outperformed the statistical predictive model.

Download Full-text

Comparison of machine and deep learning for the classification of cervical cancer based on cervicography images

Scientific Reports ◽

10.1038/s41598-021-95748-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ye Rang Park ◽

Young Jae Kim ◽

Woong Ju ◽

Kyehyun Nam ◽

Soonyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Deep Learning ◽

Learning Algorithm ◽

Vaginal Wall ◽

Learning Models ◽

Operating Characteristics ◽

Deep Learning Algorithm ◽

Machine Learning Models

AbstractCervical cancer is the second most common cancer in women worldwide with a mortality rate of 60%. Cervical cancer begins with no overt signs and has a long latent period, making early detection through regular checkups vitally immportant. In this study, we compare the performance of two different models, machine learning and deep learning, for the purpose of identifying signs of cervical cancer using cervicography images. Using the deep learning model ResNet-50 and the machine learning models XGB, SVM, and RF, we classified 4119 Cervicography images as positive or negative for cervical cancer using square images in which the vaginal wall regions were removed. The machine learning models extracted 10 major features from a total of 300 features. All tests were validated by fivefold cross-validation and receiver operating characteristics (ROC) analysis yielded the following AUCs: ResNet-50 0.97(CI 95% 0.949–0.976), XGB 0.82(CI 95% 0.797–0.851), SVM 0.84(CI 95% 0.801–0.854), RF 0.79(CI 95% 0.804–0.856). The ResNet-50 model showed a 0.15 point improvement (p < 0.05) over the average (0.82) of the three machine learning methods. Our data suggest that the ResNet-50 deep learning algorithm could offer greater performance than current machine learning models for the purpose of identifying cervical cancer using cervicography images.

Download Full-text

Trust in Intrusion Detection Systems: An Investigation of Performance Analysis for Machine Learning and Deep Learning Models

Complexity ◽

10.1155/2021/5538896 ◽

2021 ◽

Vol 2021 ◽

pp. 1-23

Author(s):

Basim Mahbooba ◽

Radhya Sahal ◽

Wael Alosaimi ◽

Martin Serrano

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Detection System ◽

Learning Technologies ◽

Machine Learning Techniques ◽

Learning Models ◽

Network Intrusion ◽

Learning Techniques ◽

The Impact

To design and develop AI-based cybersecurity systems (e.g., intrusion detection system (IDS)), users can justifiably trust, one needs to evaluate the impact of trust using machine learning and deep learning technologies. To guide the design and implementation of trusted AI-based systems in IDS, this paper provides a comparison among machine learning and deep learning models to investigate the trust impact based on the accuracy of the trusted AI-based systems regarding the malicious data in IDs. The four machine learning techniques are decision tree (DT), K nearest neighbour (KNN), random forest (RF), and naïve Bayes (NB). The four deep learning techniques are LSTM (one and two layers) and GRU (one and two layers). Two datasets are used to classify the IDS attack type, including wireless sensor network detection system (WSN-DS) and KDD Cup network intrusion dataset. A detailed comparison of the eight techniques’ performance using all features and selected features is made by measuring the accuracy, precision, recall, and F1-score. Considering the findings related to the data, methodology, and expert accountability, interpretability for AI-based solutions also becomes demanded to enhance trust in the IDS.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Deep Learning in Disease Diagnosis: Models and Datasets

Current Bioinformatics ◽

10.2174/1574893615999201002124021 ◽

2020 ◽

Vol 15 ◽

Author(s):

Deeksha Saxena ◽

Mohammed Haris Siddiqui ◽

Rajnish Kumar

Keyword(s):

Biological Sciences ◽

Machine Learning ◽

Deep Learning ◽

Disease Diagnosis ◽

Learning Models ◽

Data Types ◽

Related Data ◽

Abstract Level ◽

Experimental Validations ◽

Selection Of

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.

Download Full-text

Machine Learning-Based Malicious X.509 Certificates’ Detection

Applied Sciences ◽

10.3390/app11052164 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2164

Author(s):

Jiaxin Li ◽

Zhaoxin Zhang ◽

Changyong Guo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ensemble Learning ◽

Traffic Analysis ◽

Learning Models ◽

Detection Model ◽

Analysis Tools ◽

Average Accuracy ◽

Machine Learning Models

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates.

Download Full-text