PCirc: random forest-based plant circRNA identification software

Abstract Background Circular RNA (circRNA) is a novel type of RNA with a closed-loop structure. Increasing numbers of circRNAs are being identified in plants and animals, and recent studies have shown that circRNAs play an important role in gene regulation. Therefore, identifying circRNAs from increasing amounts of RNA-seq data is very important. However, traditional circRNA recognition methods have limitations. In recent years, emerging machine learning techniques have provided a good approach for the identification of circRNAs in animals. However, using these features to identify plant circRNAs is infeasible because the characteristics of plant circRNA sequences are different from those of animal circRNAs. For example, plants are extremely rich in splicing signals and transposable elements, and their sequence conservation in rice, for example is far less than that in mammals. To solve these problems and better identify circRNAs in plants, it is urgent to develop circRNA recognition software using machine learning based on the characteristics of plant circRNAs. Results In this study, we built a software program named PCirc using a machine learning method to predict plant circRNAs from RNA-seq data. First, we extracted different features, including open reading frames, numbers of k-mers, and splicing junction sequence coding, from rice circRNA and lncRNA data. Second, we trained a machine learning model by the random forest algorithm with tenfold cross-validation in the training set. Third, we evaluated our classification according to accuracy, precision, and F1 score, and all scores on the model test data were above 0.99. Fourth, we tested our model by other plant tests, and obtained good results, with accuracy scores above 0.8. Finally, we packaged the machine learning model built and the programming script used into a locally run circular RNA prediction software, Pcirc (https://github.com/Lilab-SNNU/Pcirc). Conclusion Based on rice circRNA and lncRNA data, a machine learning model for plant circRNA recognition was constructed in this study using random forest algorithm, and the model can also be applied to plant circRNA recognition such as Arabidopsis thaliana and maize. At the same time, after the completion of model construction, the machine learning model constructed and the programming scripts used in this study are packaged into a localized circRNA prediction software Pcirc, which is convenient for plant circRNA researchers to use.

Download Full-text

GP.6 Improving Triaging of EEG Referrals for Rule out Infantile Spasms (ITERIS)

Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques ◽

10.1017/cjn.2021.262 ◽

2021 ◽

Vol 48 (s3) ◽

pp. S13-S13

Author(s):

D Djordjevic ◽

J Tracey ◽

M Alqahtani ◽

J Boyd ◽

C Go

Keyword(s):

Machine Learning ◽

Predictive Factors ◽

Predictive Accuracy ◽

Learning Model ◽

Infantile Spasms ◽

Low Risk ◽

Machine Learning Techniques ◽

Risk Category ◽

Point System ◽

Machine Learning Model

Background: Infantile spasms (IS) is a devastating pediatric seizure disorder for which EEG referrals are prioritized at the Hospital for Sick Children, representing a resource challenge. The goal of this study was to improve the triaging system for these referrals. Methods: Part 1: descriptive analysis was performed retrospectively on EEG referrals. Part 2: prospective questionnaires were used to determine relative risk of various predictive factors. Part 3: electronic referral form was amended to include 5 positive predictive factors. A triage point system was tested by assigning EEGs as high risk (3 days), standard risk (1 week), or low risk (2 weeks). A machine learning model was developed. Results: Most EEG referrals were from community pediatricians with a low yield of IS diagnoses. Using the 5 predictive factors, the proposed triage system accurately diagnosed all IS within 3 days. No abnormal EEGs were missed in the low-risk category. The machine learning model had over 90% predictive accuracy and will be prospectively tested. Conclusions: Improving EEG triaging for IS may be possible to prioritize higher risk patients. Machine Learning techniques can potentially be applied to help with predictions. We hope that our findings will ultimately improve resource utilization and patient care.

Download Full-text

Weather prediction using random forest machine learning model

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v22.i2.pp1208-1215 ◽

2021 ◽

Vol 22 (2) ◽

pp. 1208

Author(s):

R. Meenal ◽

Prawin Angel Michael ◽

D. Pamela ◽

E. Rajasekaran

Keyword(s):

Machine Learning ◽

Wind Speed ◽

Random Forest ◽

Solar Radiation ◽

Regression Models ◽

Tamil Nadu ◽

Weather Prediction ◽

Learning Model ◽

Statistical Regression ◽

Machine Learning Model

The complex numerical climate models pose a big challenge for scientists in weather predictions, especially for tropical system. This paper is focused on presenting the importance of weather prediction using machine learning (ML) technique. Recently many researchers recommended that the machine learning models can produce sensible weather predictions in spite of having no precise knowledge of atmospheric physics. In this work, global solar radiation (GSR) in MJ/m2/day and wind speed in m/s is predicted for Tamil Nadu, India using a random forest ML model. The random forest ML model is validated with measured wind and solar radiation data collected from IMD, Pune. The prediction results based on the random forest ML model are compared with statistical regression models and SVM ML model. Overall, random forest machine learning model has minimum error values of 0.750 MSE and R2 score of 0.97. Compared to regression models and SVM ML model, the prediction results of random forest ML model are more accurate. Thus, this study neglects the need for an expensive measuring instrument in all potential locations to acquire the solar radiation and wind speed data.

Download Full-text

The importance of round-robin validation when assessing machine-learning-based vertical extrapolation of wind speeds

10.5194/wes-2020-2 ◽

2020 ◽

Author(s):

Nicola Bodini ◽

Mike Optis

Keyword(s):

Machine Learning ◽

Random Forest ◽

Power Law ◽

Wind Farm ◽

Sonic Anemometer ◽

Model Performance ◽

Learning Model ◽

Round Robin ◽

Wind Speeds ◽

Machine Learning Model

Abstract. The extrapolation of wind speeds measured at a meteorological mast to wind turbine hub heights is a key component in a bankable wind farm energy assessment and a significant source of uncertainty. Industry-standard methods for extrapolation include the power law and logarithmic profile. The emergence of machine-learning applications in wind energy has led to several studies demonstrating substantial improvements in vertical extrapolation accuracy in machine-learning methods over these conventional power law and logarithmic profile methods. In all cases, these studies assess relative model performance at a measurement site where, critically, the machine-learning algorithm requires knowledge of the hub-height wind speeds in order to train the model. This prior knowledge provides fundamental advantages to the site-specific machine-learning model over the power law and log profile, which, by contrast, are not highly tuned to hub-height measurements but rather can generalize to any site. Furthermore, there is no practical benefit in applying a machine-learning model at a site where hub-height winds are known; rather, its performance at nearby locations (i.e., across a wind farm site) without hub-height measurements is of most practical interest. To more fairly and practically compare machine-learning-based extrapolation to standard approaches, we implemented a round-robin extrapolation model comparison, in which a random forest machine-learning model is trained and evaluated at different sites and then compared against the power law and logarithmic profile. We consider 20 months of lidar and sonic anemometer data collected at four sites between 50–100 kilometers apart in the central United States. We find that the random forest outperforms the standard extrapolation approaches, especially when incorporating surface measurements as inputs to include the influence of atmospheric stability. When compared at a single site (the traditional comparison approach), the machine-learning improvement in mean absolute error was 28 % and 23 % over the power law and logarithmic profile, respectively. Using the round-robin approach proposed here, this improvement drops to 19 % and 14 %, respectively. These latter values better represent practical model performance, and we conclude that round-robin validation should be the standard for machine-learning-based, wind-speed extrapolation methods.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Random Forest Machine Learning Model for Predicting Combustion Feedback Information of a Natural Gas Spark Ignition Engine

Journal of Energy Resources Technology ◽

10.1115/1.4047761 ◽

2020 ◽

Vol 143 (1) ◽

Author(s):

Jinlong Liu ◽

Christopher Ulishney ◽

Cosmin Emil Dumitrescu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Natural Gas ◽

Engine Performance ◽

Cost Effective ◽

Learning Model ◽

Operating Conditions ◽

Control Variables ◽

Feedback Information ◽

Machine Learning Model

Abstract Engine calibration requires detailed feedback information that can reflect the combustion process as the optimized objective. Indicated mean effective pressure (IMEP) is such an indicator describing an engine’s capacity to do work under different combinations of control variables. In this context, it is of interest to find cost-effective solutions that will reduce the number of experimental tests. This paper proposes a random forest machine learning model as a cost-effective tool for optimizing engine performance. Specifically, the model estimated IMEP for a natural gas spark ignited engine obtained from a converted diesel engine. The goal was to develop an economical and robust tool that can help reduce the large number of experiments usually required throughout the design and development of internal combustion engines. The data used for building such correlative model came from engine experiments that varied the spark advance, fuel-air ratio, and engine speed. The inlet conditions and the coolant/oil temperature were maintained constant. As a result, the model inputs were the key engine operation variables that affect engine performance. The trained model was shown to be able to predict the combustion-related feedback information with good accuracy (R2 ≈ 0.9 and MSE ≈ 0). In addition, the model accurately reproduced the effect of control variables on IMEP, which would help narrow the choice of operating conditions for future designs of experiment. Overall, the machine learning approach presented here can provide new chances for cost-efficient engine analysis and diagnostics work.

Download Full-text

Machine-learning model derived gene signature predictive of paclitaxel survival benefit in gastric cancer: results from the randomised phase III SAMIT trial

Gut ◽

10.1136/gutjnl-2021-324060 ◽

2021 ◽

pp. gutjnl-2021-324060

Author(s):

Raghav Sundar ◽

Nesaretnam Barr Kumarakulasinghe ◽

Yiong Huak Chan ◽

Kazuhiro Yoshida ◽

Takaki Yoshikawa ◽

...

Keyword(s):

Machine Learning ◽

Gastric Cancer ◽

Random Forest ◽

Survival Benefit ◽

Validation Cohort ◽

External Validation ◽

Gene Signature ◽

Learning Model ◽

Phase Iii ◽

Machine Learning Model

ObjectiveTo date, there are no predictive biomarkers to guide selection of patients with gastric cancer (GC) who benefit from paclitaxel. Stomach cancer Adjuvant Multi-Institutional group Trial (SAMIT) was a 2×2 factorial randomised phase III study in which patients with GC were randomised to Pac-S-1 (paclitaxel +S-1), Pac-UFT (paclitaxel +UFT), S-1 alone or UFT alone after curative surgery.DesignThe primary objective of this study was to identify a gene signature that predicts survival benefit from paclitaxel chemotherapy in GC patients. SAMIT GC samples were profiled using a customised 476 gene NanoString panel. A random forest machine-learning model was applied on the NanoString profiles to develop a gene signature. An independent cohort of metastatic patients with GC treated with paclitaxel and ramucirumab (Pac-Ram) served as an external validation cohort.ResultsFrom the SAMIT trial 499 samples were analysed in this study. From the Pac-S-1 training cohort, the random forest model generated a 19-gene signature assigning patients to two groups: Pac-Sensitive and Pac-Resistant. In the Pac-UFT validation cohort, Pac-Sensitive patients exhibited a significant improvement in disease free survival (DFS): 3-year DFS 66% vs 40% (HR 0.44, p=0.0029). There was no survival difference between Pac-Sensitive and Pac-Resistant in the UFT or S-1 alone arms, test of interaction p<0.001. In the external Pac-Ram validation cohort, the signature predicted benefit for Pac-Sensitive (median PFS 147 days vs 112 days, HR 0.48, p=0.022).ConclusionUsing machine-learning techniques on one of the largest GC trials (SAMIT), we identify a gene signature representing the first predictive biomarker for paclitaxel benefit.Trial registration numberUMIN Clinical Trials Registry: C000000082 (SAMIT); ClinicalTrials.gov identifier, 02628951 (South Korean trial)

Download Full-text

The Hybrid Machine Learning Model Based on Random Forest Optimized by PSO and ACO for Predicting Heart Disease

Proceedings of the Third International Conference on Computing and Wireless Communication Systems, ICCWCS 2019, April 24-25, 2019, Faculty of Sciences, Ibn Tofaïl University -Kénitra- Morocco ◽

10.4108/eai.24-4-2019.2284088 ◽

2019 ◽

Author(s):

Youness KHOURDIFI ◽

Mohamed BAHAJ

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Learning Model ◽

Model Based ◽

Machine Learning Model ◽

Hybrid Machine

Download Full-text

Forecasting INR Exchange Rate Against USD, GBP, JPY, SGD, EUR, AED Using Machine Learning

EPRA International Journal of Research & Development (IJRD) ◽

10.36713/epra7960 ◽

2021 ◽

pp. 684-689

Author(s):

Sumith Pevekar

Keyword(s):

Machine Learning ◽

Exchange Rate ◽

Random Forest ◽

Foreign Exchange ◽

Foreign Currency ◽

Foreign Exchange Rate ◽

Random Forest Algorithm ◽

Currency Exchange ◽

Predicting Performance ◽

Machine Learning Model

The price of a native currency expressed in terms of another currency is known as a foreign exchange rate. In other terms, a foreign exchange rate compares the value of one currency to that of another. The value of standardized currencies varies with demand, supply, and consumer confidence around the world due to which their values fluctuate over time. To forecast the exchange rate of INR, I have developed a machine learning model. The model was trained to estimate six foreign currency exchange rates against the Indian Rupee using historical data. This model uses Random Forest algorithm to train and predict the values. The suggested system’s predicting performance is assessed and contrasted using statistical metrics. According to the findings, the Random Forest algorithm-based model predicts well and achieves an accuracy of 93.61%. KEYWORDS: Regression, Random Forest, Exchange Rate, INR

Download Full-text

Improving Heart Disease Prediction Using Random Forest and AdaBoost Algorithms

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i11.24781 ◽

2021 ◽

Vol 17 (11) ◽

pp. 60

Author(s):

Halima EL Hamdaoui ◽

Said Boujraf ◽

Nour El Houda Chaoui ◽

Badr Alami ◽

Mustapha Maaroufi

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Decision Support ◽

Random Forest ◽

Clinical Decision Support ◽

Clinical Decision ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Random Forest Algorithm ◽

Adaboost Algorithm

heart disease is a major cause of death worldwide. Thus, diagnosis and prediction of heart disease remain mandatory. Clinical decision support systems based on machine learning techniques have become the primary tool to assist clinicians and contribute to automated diagnosis. This paper aims to predict heart disease using Random Forest algorithm enhanced with the boosting algorithm Adaboost. The model is trained and tested on University of California Irvine (UCI) Cleveland and Statlog heart disease datasets using the most relevant features 14 attributes. The result shows that Random Forest algorithm combined with AdaBoost algorithm achieved higher accuracy than applying only Radom Forest algorithm, 96.16%, 95.98%, respectively. We compare our suggested model to report machine learning classifiers. Indeed, the obtained result is supporting the efficiency and validity of our model. Besides, the proposed model achieved high accuracy compared to existing studies in the literature that confirmed that a clinical decision support system could be used to predict heart disease based on machine learning algorithms.

Download Full-text