Using Machine Learning to Analyze Factors Determining Cycle-to-Cycle Variation in a Spark-Ignited Gasoline Engine

In this work, we have applied a machine learning (ML) technique to provide insights into the causes of cycle-to-cycle variation (CCV) in a gasoline spark-ignited (SI) engine. The analysis was performed on a set of large eddy simulation (LES) calculations of a single cylinder of a four-cylinder port-fueled SI engine. The operating condition was stoichiometric, without significant knock, at a load of 16 bar brake mean effective pressure (BMEP), at an engine speed of 2500 rpm. A total of 123 cycles was simulated. Of these, 49 were run in sequence, while 74 were run in parallel. For the parallel approach, each cycle is initialized with its own synthetic turbulent field to generate CCV, as a part of another work performed by us. In this work, we used 3D information from all 123 cycles to compute flame topology and pre-ignition flow-field metrics. We then evaluated correlations between these metrics and peak cylinder pressure (PCP) employing an ML technique called random forest. The computed metrics form the inputs to the random forest model, and PCP is the output. This model captures the effect of all inputs, as well as interactions between them owing to its decision-tree structure. The goal of this work is to demonstrate (as a first step) that ML models can implicitly learn complex relationships between the pre-ignition flow-fields, the flame shapes, and the eventual outcome of the cycle (whether a cycle will be a high or a low cycle).

Download Full-text

Machine Learning Analysis of Factors Impacting Cycle-to-Cycle Variation in a Gasoline Spark-Ignited Engine

Volume 2: Emissions Control Systems; Instrumentation, Controls, and Hybrids; Numerical Simulation; Engine Design and Mechanical Development ◽

10.1115/icef2017-3604 ◽

2017 ◽

Cited By ~ 3

Author(s):

Janardhan Kodavasal ◽

Ahmed Abdul Moiz ◽

Muhsin Ameen ◽

Sibendu Som

Keyword(s):

Machine Learning ◽

Random Forest ◽

Flow Field ◽

Three Dimensional ◽

Random Forest Model ◽

Si Engine ◽

Forest Model ◽

Eddy Simulation ◽

Cycle Variation ◽

Brake Mean Effective Pressure

In this work, we have applied a machine learning (ML) technique to provide insights into the underlying causes of cycle-to-cycle variation (CCV) in a gasoline spark-ignited (SI) engine. The analysis was performed on a set of large eddy simulation (LES) calculations of a single cylinder of a four-cylinder port-fueled SI engine. The operating condition studied was stoichiometric, without significant knock, and represents a load of 16 bar brake mean effective pressure (BMEP), at an engine speed of 2500 revolutions per minute. A total of 123 cycles was simulated. Of these, 49 were run in sequence, while 74 were run in a parallel manner. For the parallel approach, each cycle is initialized with its own synthetic turbulent field (through perturbation of the base field) to generate CCV, as part of another work performed by us. In the current work, we post-processed three-dimensional information from all 123 cycles to compute various flame topology and pre-ignition flow-field metrics. We then evaluated correlations between these computed metrics, and peak cylinder pressure (PCP) employing an ML technique called random forest which was used to learn the correlation between PCP, and these flame topology and pre-ignition flow-field metrics. The computed metrics form the inputs to the random forest model developed, and PCP is the predicted output. The random forest model inherently captures the effect of all inputs, as well as interactions between them owing to its decision-tree structure. The goal of this work is to demonstrate (as a first step) that ML models can implicitly learn complex relationships between pre-ignition flow-fields, flame shapes, and the eventual outcome of the cycle (whether a cycle will be a high or a low cycle).

Download Full-text

Predicting Cycle-to-Cycle Variation With Concurrent Cycles in a Gasoline Direct Injected Engine With Large Eddy Simulations

Journal of Energy Resources Technology ◽

10.1115/1.4044766 ◽

2019 ◽

Vol 142 (4) ◽

Author(s):

Daniel M. Probst ◽

Sameera Wijeyakulasuriya ◽

Eric Pomraning ◽

Janardhan Kodavasal ◽

Riccardo Scarcelli ◽

...

Keyword(s):

Gasoline Engine ◽

Direct Injection ◽

Engine Performance ◽

Operating Conditions ◽

Gasoline Direct Injection ◽

Effective Pressure ◽

Cycle Variation ◽

Brake Mean Effective Pressure ◽

Large Eddy ◽

Operating Points

Abstract High cycle-to-cycle variation (CCV) is detrimental to engine performance, as it leads to poor combustion and high noise and vibration. In this work, CCV in a gasoline engine is studied using large eddy simulation (LES). The engine chosen as the basis of this work is a single-cylinder gasoline direct injection (GDI) research engine. Two stoichiometric part-load engine operating points (6 brake mean effective pressure (BMEP) and 2000 revolutions per minute) were evaluated: a nondilute (0% exhaust gas recirculation (EGR)) case and a dilute (18% EGR) case. The experimental data for both operating conditions had 500 cycles. The measured CCV in indicated mean effective pressure (IMEP) was 1.40% for the nondilute case and 7.78% for the dilute case. To estimate CCV from simulation, perturbed concurrent cycles of engine simulations were compared with consecutively obtained engine cycles. The motivation behind this is that running consecutive cycles to estimate CCV is quite time consuming. For example, running 100 consecutive cycles requires 2–3 months (on a typical cluster); however, by running concurrently, one can potentially run all 100 cycles at the same time and reduce the overall turnaround time for 100 cycles to the time taken for a single cycle (2 days). The goal of this paper is to statistically determine if concurrent cycles, with a perturbation applied to each individual cycle at the start, can be representative of consecutively obtained cycles and accurately estimate CCV. 100 cycles were run for each case to obtain statistically valid results. The concurrent cycles began at different timings before the combustion event, with the motivation to identify the closest time before spark to minimize the run time. Only a single combustion cycle was run for each concurrent case. The calculated standard deviation of peak pressure and coefficient of variance (COV) of IMEP were compared between the consecutive and concurrent methods to quantify CCV. It was found that the concurrent method could be used to predict CCV with either a velocity or numerical perturbation. Both a large and small velocity perturbations were compared, and both produced correct predictions, implying that the type of perturbation is not important to yield a valid realization. Starting the simulation too close to the combustion event, at intake valve close (IVC) or at spark timing, underpredicted the CCV. When concurrent simulations were initiated during or before the intake even, at start of injection (SOI) or earlier, distinct and valid realizations were obtained to accurately predict CCV for both operating points. By simulating CCV with concurrent cycles, the required wall clock time can be reduced from 2–3 months to 1–2 days. In addition, the required core-hours can be reduced up to 41%, since only a portion of each cycle needs to be simulated.

Download Full-text

Multi-Cycle Large Eddy Simulation (LES) of the Cycle-to-Cycle Variation (CCV) of Spark Ignition (SI) - Controlled Auto-Ignition (CAI) Hybrid Combustion in a Gasoline Engine

10.4271/2017-01-2261 ◽

2017 ◽

Author(s):

Xinyan Wang ◽

Hua Zhao

Keyword(s):

Large Eddy Simulation ◽

Gasoline Engine ◽

Spark Ignition ◽

Eddy Simulation ◽

Cycle Variation ◽

Auto Ignition ◽

Large Eddy

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

A Study on Host Tropism Determinants of Influenza Virus Using Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666191104160927 ◽

2020 ◽

Vol 15 (2) ◽

pp. 121-134 ◽

Cited By ~ 2

Author(s):

Eunmi Kwon ◽

Myeongji Cho ◽

Hayeon Kim ◽

Hyeon S. Son

Keyword(s):

Machine Learning ◽

Amino Acids ◽

Influenza Virus ◽

Random Forest ◽

Physicochemical Properties ◽

Protein Sequences ◽

Influenza Viruses ◽

Host Tropism ◽

Post Hoc ◽

Ha Protein

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.

Download Full-text

Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

10.2196/preprints.11728 ◽

2018 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huixian Li ◽

Jiexin Zhang ◽

...

Keyword(s):

Machine Learning ◽

Retrospective Study ◽

Random Forest ◽

Precocious Puberty ◽

Prediction Models ◽

Central Precocious Puberty ◽

Machine Learning Algorithms ◽

Stimulation Test ◽

Gnrh Analogue ◽

Prediction Probability

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

Knee Muscle Force Estimating Model Using Machine Learning Approach

The Computer Journal ◽

10.1093/comjnl/bxaa160 ◽

2020 ◽

Author(s):

Anurag Sohane ◽

Ravinder Agarwal

Keyword(s):

Machine Learning ◽

Random Forest ◽

Muscle Force ◽

Vastus Lateralis ◽

Input Parameter ◽

Research Work ◽

Cost Effective ◽

Coefficient Of Determination ◽

Muscle Forces ◽

Knee Muscle

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text