Random Forest Machine Learning Model for Predicting Combustion Feedback Information of a Natural Gas Spark Ignition Engine

2020 ◽  
Vol 143 (1) ◽  
Author(s):  
Jinlong Liu ◽  
Christopher Ulishney ◽  
Cosmin Emil Dumitrescu

Abstract Engine calibration requires detailed feedback information that can reflect the combustion process as the optimized objective. Indicated mean effective pressure (IMEP) is such an indicator describing an engine’s capacity to do work under different combinations of control variables. In this context, it is of interest to find cost-effective solutions that will reduce the number of experimental tests. This paper proposes a random forest machine learning model as a cost-effective tool for optimizing engine performance. Specifically, the model estimated IMEP for a natural gas spark ignited engine obtained from a converted diesel engine. The goal was to develop an economical and robust tool that can help reduce the large number of experiments usually required throughout the design and development of internal combustion engines. The data used for building such correlative model came from engine experiments that varied the spark advance, fuel-air ratio, and engine speed. The inlet conditions and the coolant/oil temperature were maintained constant. As a result, the model inputs were the key engine operation variables that affect engine performance. The trained model was shown to be able to predict the combustion-related feedback information with good accuracy (R2 ≈ 0.9 and MSE ≈ 0). In addition, the model accurately reproduced the effect of control variables on IMEP, which would help narrow the choice of operating conditions for future designs of experiment. Overall, the machine learning approach presented here can provide new chances for cost-efficient engine analysis and diagnostics work.

Author(s):  
Jinlong Liu ◽  
Christopher Ulishney ◽  
Cosmin E. Dumitrescu

Abstract Converting existing compression ignition engines to spark ignition approach is a promising approach to increase the application of natural gas in the heavy-duty transportation sector. However, the diesel-like environment dramatically affects the engine performance and emissions. As a result, experimental tests are needed to investigate the characteristics of such converted engines. A machine learning model based on bagged decision trees algorithm was established in this study to reduce the experimental cost and identify the operating conditions of special interest for analysis. Preliminary engine tests that changed spark timing, mixture equivalence ratio, and engine speed (three key engine operation variables) but maintained intake and boundary conditions were applied as model input to train such a correlative model. The model output was the indicated mean effective pressure, which is an engine parameter generally used to assist in locating high engine efficiency regions at constant engine speed and fuel/air ratio. After training, the correlative model can provide acceptable prediction performance except few outliers. Subsequently, boosting ensemble learning approach was applied in this study to help improve the model performance. Furthermore, the results showed that the boosted decision trees algorithm better described the combustion process inside the cylinder, as least for the operating conditions investigated in this study.


Author(s):  
R. Meenal ◽  
Prawin Angel Michael ◽  
D. Pamela ◽  
E. Rajasekaran

The complex numerical climate models pose a big challenge for scientists in weather predictions, especially for tropical system. This paper is focused on presenting the importance of weather prediction using machine learning (ML) technique. Recently many researchers recommended that the machine learning models can produce sensible weather predictions in spite of having no precise knowledge of atmospheric physics. In this work, global solar radiation (GSR) in MJ/m2/day and wind speed in m/s is predicted for Tamil Nadu, India using a random forest ML model. The random forest ML model is validated with measured wind and solar radiation data collected from IMD, Pune. The prediction results based on the random forest ML model are compared with statistical regression models and SVM ML model. Overall, random forest machine learning model has minimum error values of 0.750 MSE and R2 score of 0.97. Compared to regression models and SVM ML model, the prediction results of random forest ML model are more accurate. Thus, this study neglects the need for an expensive measuring instrument in all potential locations to acquire the solar radiation and wind speed data.


2020 ◽  
Author(s):  
Nicola Bodini ◽  
Mike Optis

Abstract. The extrapolation of wind speeds measured at a meteorological mast to wind turbine hub heights is a key component in a bankable wind farm energy assessment and a significant source of uncertainty. Industry-standard methods for extrapolation include the power law and logarithmic profile. The emergence of machine-learning applications in wind energy has led to several studies demonstrating substantial improvements in vertical extrapolation accuracy in machine-learning methods over these conventional power law and logarithmic profile methods. In all cases, these studies assess relative model performance at a measurement site where, critically, the machine-learning algorithm requires knowledge of the hub-height wind speeds in order to train the model. This prior knowledge provides fundamental advantages to the site-specific machine-learning model over the power law and log profile, which, by contrast, are not highly tuned to hub-height measurements but rather can generalize to any site. Furthermore, there is no practical benefit in applying a machine-learning model at a site where hub-height winds are known; rather, its performance at nearby locations (i.e., across a wind farm site) without hub-height measurements is of most practical interest. To more fairly and practically compare machine-learning-based extrapolation to standard approaches, we implemented a round-robin extrapolation model comparison, in which a random forest machine-learning model is trained and evaluated at different sites and then compared against the power law and logarithmic profile. We consider 20 months of lidar and sonic anemometer data collected at four sites between 50–100 kilometers apart in the central United States. We find that the random forest outperforms the standard extrapolation approaches, especially when incorporating surface measurements as inputs to include the influence of atmospheric stability. When compared at a single site (the traditional comparison approach), the machine-learning improvement in mean absolute error was 28 % and 23 % over the power law and logarithmic profile, respectively. Using the round-robin approach proposed here, this improvement drops to 19 % and 14 %, respectively. These latter values better represent practical model performance, and we conclude that round-robin validation should be the standard for machine-learning-based, wind-speed extrapolation methods.


Author(s):  
Samuel M. Hipple ◽  
Zachary T. Reinhart ◽  
Harry Bonilla-Alvarado ◽  
Paolo Pezzini ◽  
Kenneth Mark Bryden

Abstract With increasing regulation and the push for clean energy, the operation of power plants is becoming increasingly complex. This complexity combined with the need to optimize performance at base load and off-design condition means that predicting power plant performance with computational modeling is more important than ever. However, traditional modeling approaches such as physics-based models do not capture the true performance of power plant critical components. The complexity of factors such as coupling, noise, and off-design operating conditions makes the performance prediction of critical components such as turbomachinery difficult to model. In a complex system, such as a gas turbine power plant, this creates significant disparities between models and actual system performance that limits the detection of abnormal operations. This study compares machine learning tools to predict gas turbine performance over traditional physics-based models. A long short-term memory (LSTM) model, a form of a recurrent neural network, was trained using operational datasets from a 100 kW recuperated gas turbine power system designed for hybrid configuration. The LSTM turbine model was trained to predict shaft speed, outlet pressure, and outlet temperature. The performance of both the machine learning model and a physics-based model were compared against experimental data of the gas turbine system. Results show that the machine learning model has significant advantages in prediction accuracy and precision compared to a traditional physics-based model when fed facility data as an input. This advantage of predicting performance by machine learning models can be used to detect abnormal operations.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6543 ◽  
Author(s):  
Diptesh Das ◽  
Junichi Ito ◽  
Tadashi Kadowaki ◽  
Koji Tsuda

We present an interpretable machine learning model for medical diagnosis called sparse high-order interaction model with rejection option (SHIMR). A decision tree explains to a patient the diagnosis with a long rule (i.e., conjunction of many intervals), while SHIMR employs a weighted sum of short rules. Using proteomics data of 151 subjects in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, SHIMR is shown to be as accurate as other non-interpretable methods (Sensitivity, SN = 0.84 ± 0.1, Specificity, SP = 0.69 ± 0.15 and Area Under the Curve, AUC = 0.86 ± 0.09). For clinical usage, SHIMR has a function to abstain from making any diagnosis when it is not confident enough, so that a medical doctor can choose more accurate but invasive and/or more costly pathologies. The incorporation of a rejection option complements SHIMR in designing a multistage cost-effective diagnosis framework. Using a baseline concentration of cerebrospinal fluid (CSF) and plasma proteins from a common cohort of 141 subjects, SHIMR is shown to be effective in designing a patient-specific cost-effective Alzheimer’s disease (AD) pathology. Thus, interpretability, reliability and having the potential to design a patient-specific multistage cost-effective diagnosis framework can make SHIMR serve as an indispensable tool in the era of precision medicine that can cater to the demand of both doctors and patients, and reduce the overwhelming financial burden of medical diagnosis.


Gut ◽  
2021 ◽  
pp. gutjnl-2021-324060
Author(s):  
Raghav Sundar ◽  
Nesaretnam Barr Kumarakulasinghe ◽  
Yiong Huak Chan ◽  
Kazuhiro Yoshida ◽  
Takaki Yoshikawa ◽  
...  

ObjectiveTo date, there are no predictive biomarkers to guide selection of patients with gastric cancer (GC) who benefit from paclitaxel. Stomach cancer Adjuvant Multi-Institutional group Trial (SAMIT) was a 2×2 factorial randomised phase III study in which patients with GC were randomised to Pac-S-1 (paclitaxel +S-1), Pac-UFT (paclitaxel +UFT), S-1 alone or UFT alone after curative surgery.DesignThe primary objective of this study was to identify a gene signature that predicts survival benefit from paclitaxel chemotherapy in GC patients. SAMIT GC samples were profiled using a customised 476 gene NanoString panel. A random forest machine-learning model was applied on the NanoString profiles to develop a gene signature. An independent cohort of metastatic patients with GC treated with paclitaxel and ramucirumab (Pac-Ram) served as an external validation cohort.ResultsFrom the SAMIT trial 499 samples were analysed in this study. From the Pac-S-1 training cohort, the random forest model generated a 19-gene signature assigning patients to two groups: Pac-Sensitive and Pac-Resistant. In the Pac-UFT validation cohort, Pac-Sensitive patients exhibited a significant improvement in disease free survival (DFS): 3-year DFS 66% vs 40% (HR 0.44, p=0.0029). There was no survival difference between Pac-Sensitive and Pac-Resistant in the UFT or S-1 alone arms, test of interaction p<0.001. In the external Pac-Ram validation cohort, the signature predicted benefit for Pac-Sensitive (median PFS 147 days vs 112 days, HR 0.48, p=0.022).ConclusionUsing machine-learning techniques on one of the largest GC trials (SAMIT), we identify a gene signature representing the first predictive biomarker for paclitaxel benefit.Trial registration numberUMIN Clinical Trials Registry: C000000082 (SAMIT); ClinicalTrials.gov identifier, 02628951 (South Korean trial)


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuwei Yin ◽  
Xiao Tian ◽  
Jingjing Zhang ◽  
Peisen Sun ◽  
Guanglin Li

Abstract Background Circular RNA (circRNA) is a novel type of RNA with a closed-loop structure. Increasing numbers of circRNAs are being identified in plants and animals, and recent studies have shown that circRNAs play an important role in gene regulation. Therefore, identifying circRNAs from increasing amounts of RNA-seq data is very important. However, traditional circRNA recognition methods have limitations. In recent years, emerging machine learning techniques have provided a good approach for the identification of circRNAs in animals. However, using these features to identify plant circRNAs is infeasible because the characteristics of plant circRNA sequences are different from those of animal circRNAs. For example, plants are extremely rich in splicing signals and transposable elements, and their sequence conservation in rice, for example is far less than that in mammals. To solve these problems and better identify circRNAs in plants, it is urgent to develop circRNA recognition software using machine learning based on the characteristics of plant circRNAs. Results In this study, we built a software program named PCirc using a machine learning method to predict plant circRNAs from RNA-seq data. First, we extracted different features, including open reading frames, numbers of k-mers, and splicing junction sequence coding, from rice circRNA and lncRNA data. Second, we trained a machine learning model by the random forest algorithm with tenfold cross-validation in the training set. Third, we evaluated our classification according to accuracy, precision, and F1 score, and all scores on the model test data were above 0.99. Fourth, we tested our model by other plant tests, and obtained good results, with accuracy scores above 0.8. Finally, we packaged the machine learning model built and the programming script used into a locally run circular RNA prediction software, Pcirc (https://github.com/Lilab-SNNU/Pcirc). Conclusion Based on rice circRNA and lncRNA data, a machine learning model for plant circRNA recognition was constructed in this study using random forest algorithm, and the model can also be applied to plant circRNA recognition such as Arabidopsis thaliana and maize. At the same time, after the completion of model construction, the machine learning model constructed and the programming scripts used in this study are packaged into a localized circRNA prediction software Pcirc, which is convenient for plant circRNA researchers to use.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Hikmet Can Çubukçu

Abstract Objectives The present study set out to build a machine learning model to incorporate conventional quality control (QC) rules, exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) with random forest (RF) algorithm to achieve better performance and to evaluate the performances the models using computer simulation to aid laboratory professionals in QC procedure planning. Methods Conventional QC rules, EWMA, CUSUM, and RF models were implemented on the simulation data using an in-house algorithm. The models’ performances were evaluated on 170,000 simulated QC results using outcome metrics, including the probability of error detection (Ped), probability of false rejection (Pfr), average run length (ARL), and power graph. Results The highest Pfr (0.0404) belonged to the 1–2s rule. The 1–3s rule could not detect errors with a 0.9 Ped up to 4 SD of systematic error. The random forest model had the highest Ped for systematic errors lower than 1 SD. However, ARLs of the model require the combined utility of the RF model with conventional QC rules having lower ARLs or more than one QC measurement is required. Conclusions The RF model presented in this study showed acceptable Ped for most degrees of systematic error. The outcome metrics established in this study will help laboratory professionals planning internal QC.


Sign in / Sign up

Export Citation Format

Share Document