Application of Random Forest Machine Learning Models to Forecast Combustion Profile Parameters of a Natural Gas Spark Ignition Engine

Volume 6: Design, Systems, and Complexity ◽

10.1115/imece2020-23973 ◽

2020 ◽

Author(s):

Jinlong Liu ◽

Christopher Ulishney ◽

Cosmin E. Dumitrescu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Spark Ignition ◽

Spark Ignition Engine ◽

Combustion Model ◽

Learning Models ◽

Ic Engine ◽

Spark Timing ◽

Machine Learning Models

Abstract Predicting internal combustion (IC) engine variables such as the combustion phasing and duration are essential to zero-dimensional (0D) single-zone engine simulations (e.g., for the Wiebe function combustion model). This paper investigated the use of random forest machine learning models to predict these engine combustion parameters as a modality to reduce expensive engine dynamometer tests. A single-cylinder four-stroke heavy-duty spark-ignition engine fueled with methane was operated at different engine speeds and loads to provide the data for training, validation, and testing the proposed correlated model. Key engine operating variables such as spark timing, mixture equivalence ratio, and engine speed were the model inputs. The performance of the models was validated by comparing the prediction dataset with the experimentally measured results. Results showed that the prediction error of the random forest machine learning algorithm was acceptable, suggesting that it can be used to predict the combustion parameters of interest with acceptable accuracy.

Get full-text (via PubEx)

CFD Optimization of the Pre-Chamber Geometry for a Gasoline Spark Ignition Engine

Frontiers in Mechanical Engineering ◽

10.3389/fmech.2020.599752 ◽

2021 ◽

Vol 6 ◽

Author(s):

Haiwen Ge ◽

Ahmad Hadi Bakir ◽

Suren Yadav ◽

Yunseon Kang ◽

Siva Parameswaran ◽

...

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Support Vector Machine Model ◽

Spark Ignition ◽

Spark Ignition Engine ◽

Support Vector ◽

Learning Models ◽

Machine Model ◽

Machine Learning Model ◽

Machine Learning Models

In the present paper, an efficient optimization method based on Bayesian updating strategy is developed for the design of a spark-ignition engine equipped with pre-chamber. 3D computational fluid dynamics (CFD) simulation coupled with strategies including design of experiment, genetic algorithm, and machine learning methods is used to optimize the pre-chamber with desired combustion phasing. The optimization process starts from a design of experiment matrix of 11 design parameters, which are used to analytically characterize the pre-chamber geometry and set up the 3D combustion CFD. Taking CA50 as the single objective, the CFD results are then used to train the machine learning models. Different machine learning models are evaluated based on their Root Mean Square Error. Five machine learning models from five different categories are selected for second round evaluation. The trained machine learning model is used in the genetic algorithm optimization, which yields the optimized configuration and is again justified by CFD. The new CFD results based on the optimized design are added into the database to further refine the machine learning model. After 24 iterations for each selected machine learning models, the medium Gaussian support vector machine model is identified as the best method for the present application. Iterations using the medium Gaussian support vector machine model continue until a satisfactory result is achieved. Detailed combustion analysis is conducted to investigate the physical mechanism about how the design of pre-chamber influences the engine's performance. It is found that larger volume of the upper part of the pre-chamber results in stronger jet flow and turbulent intensity which further accelerates the flame propagation inside the pre-chamber, dominating the contrary effects from reduced pressure and temperature. Regression analysis shows that the radius of the pre-chamber is the most influential design parameter. The current work not only sheds light on the optimization of engine design, but also has demonstrated a general strategy applicable to the purpose of arbitrary engine optimization and mechanical system design.

Get full-text (via PubEx)

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Get full-text (via PubEx)

Random forest and long short-term memory based machine learning models for classification of ion mobility spectrometry spectra

Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XXII ◽

10.1117/12.2585829 ◽

2021 ◽

Author(s):

Patrick C. Riley ◽

Samir V. Deshpande ◽

Brian S. Ince ◽

Brian C. Hauck ◽

Kyle P. O'Donnell ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ion Mobility ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Machine Learning Models

Get full-text (via PubEx)

Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.701288 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Lowell Weller ◽

Tanzy M. T. Love ◽

Martin Wiedmann

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Training Data ◽

Agricultural Water ◽

Learning Models ◽

Safety Hazards ◽

E Coli ◽

Resampling Method ◽

Machine Learning Models

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.

Get full-text (via PubEx)

Comparison of Random Forest and Neural Network in Modelling the Performance and Emissions of a Natural Gas Spark Ignition Engine

Journal of Energy Resources Technology ◽

10.1115/1.4053301 ◽

2021 ◽

pp. 1-20

Author(s):

Jinlong Liu ◽

Qiao Huang ◽

Christopher Ulishney ◽

Cosmin E. Dumitrescu

Keyword(s):

Neural Network ◽

Random Forest ◽

Natural Gas ◽

Internal Combustion Engines ◽

Engine Performance ◽

Spark Ignition ◽

Spark Ignition Engine ◽

Data Driven ◽

Ann Model ◽

Mean Square Errors

Abstract Machine learning (ML) models can accelerate the development of efficient internal combustion engines. This study assessed the feasibility of data-driven methods towards predicting the performance of a diesel engine modified to natural gas spark ignition, based on a limited number of experiments. As the best ML technique cannot be chosen a priori, the applicability of different ML algorithms for such an engine application was evaluated. Specifically, the performance of two widely used ML algorithms, the random forest (RF) and the artificial neural network (ANN), in forecasting engine responses related to in-cylinder combustion phenomena was compared. The results indicated that both algorithms with spark timing, mixture equivalence ratio, and engine speed as model inputs produced acceptable results with respect to predicting engine performance, combustion phasing, and engine-out emissions. Despite requiring more effort in hyperparameter optimization, the ANN model performed better than the RF model, especially for engine emissions, as evidenced by the larger R-squared, smaller root-mean-square errors, and more realistic predictions of the effects of key engine control variables on the engine performance. However, in applications where the combustion behavior knowledge is limited, it is recommended to use a RF model to quickly determine the appropriate number of model inputs. Consequently, using the RF model to define the model structure and then employing the ANN model to improve the model's predictive capability can help to rapidly build data-driven engine combustion models.

Get full-text (via PubEx)

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02206-9 ◽

2020 ◽

Vol 58 (10) ◽

pp. 2517-2529

Author(s):

Farahnaz Hajipour ◽

Mohammad Jafari Jozani ◽

Zahra Moussavi

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Random Forest ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Get full-text (via PubEx)

Three machine learning models for the 2019 Solubility Challenge

ADMET & DMPK ◽

10.5599/admet.835 ◽

2020 ◽

Cited By ~ 1

Author(s):

John Mitchell

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gold Standard ◽

Challenge Test ◽

Wisdom Of Crowds ◽

Learning Models ◽

The Third ◽

Aqueous Solubilities ◽

Machine Learning Models ◽

Better Than

<p class="ADMETabstracttext">We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.</p>

Get full-text (via PubEx)

CPT Data Interpretation Employing Different Machine Learning Techniques

Geosciences ◽

10.3390/geosciences11070265 ◽

2021 ◽

Vol 11 (7) ◽

pp. 265

Author(s):

Stefan Rauter ◽

Franz Tschuchnigg

Keyword(s):

Machine Learning ◽

Grain Size ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Cone Penetration ◽

Tip Resistance ◽

Machine Learning Models

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Get full-text (via PubEx)

Mode Choice Prediction using Machine Learning Technique for A Door-to-Door Journey in Kuantan City

Mekatronika ◽

10.15282/mekatronika.v2i1.6745 ◽

2020 ◽

Vol 2 (1) ◽

pp. 73-78

Author(s):

Nur Fahriza Mohd Ali ◽

Ahmad Farhan Mohd Sadullah ◽

Anwar P.P. Abdul Majeed ◽

Mohd Azraai Mohd Razman ◽

Rabiu Muazu Musa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mode Choice ◽

Learning Models ◽

Machine Learning Technique ◽

Travel Mode Choice ◽

Testing Data ◽

Learning Technique ◽

The City ◽

Machine Learning Models

A door-to-door journey in a public transportation system is a notable concept that is practically being promoted among users to consider public transport as an important alternative. The door-to-door journey will integrate the travel segments starting from home to destination, including all visible amenities. Users’ preferences on the time travel of these key segments are necessary to be understood. In this case, Machine Learning technique has been seen as a robust computational advancement to forecast their travel mode choice. However, the most convenient model as the best predictor is still questionable. To address this issue, we employed some pre-eminent machine learning models, specifically Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbor (kNN) as well as Support Vector Machine (SVM), to compare their travel mode choice prediction performance of users in the city of Kuantan. The data collection was conducted in Kuantan City via Revealed/Stated Preferences (RPSP) Survey between 8:00 AM to 5:00 PM on weekdays. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The results depicted that the Random Forest could provide satisfactory classification accuracies for both training and testing data up to 68.3% and 61.3%, respectively, compared to the other evaluated machine learning models. In summary, Random Forest provides a good result in the training and testing data and is considered as the best predictor in this research to forecast users’ mode choice in the city of Kuantan.

Get full-text (via PubEx)