scholarly journals Application of Machine Learning Algorithms in Predicting Pyrolytic Analysis Result

2021 ◽  
Vol 931 (1) ◽  
pp. 012013
Author(s):  
Le Thi Nhut Suong ◽  
A V Bondarev ◽  
E V Kozlova

Abstract Geochemical studies of organic matter in source rocks play an important role in predicting the oil and gas accumulation of any territory, especially in oil and gas shale. For deep understanding, pyrolytic analyses are often carried out on samples before and after extraction of hydrocarbon with chloroform. However, extraction is a laborious and time-consuming process and the workload of laboratory equipment and time doubles. In this work, machine learning regression algorithms is applied for forecasting S2ex based on the pyrolytic analytic result of non-extracted samples. This study is carried out using more than 300 samples from 3 different wells in Bazhenov formation, Western Siberia. For developing a prediction model, 5 different machine learning regression algorithms including Multiple Linear Regression, Polynomial Regression, Support vector regression, Decision tree and Random forest have been tested and compared. The performance of these algorithms is examined by R-squared coefficient. The data of the X2 well was used for building a model. Simultaneously, this data is divided into 2 parts – 80% for training and 20% for checking. The model also was used for prediction of wells X1 and X3. Then, these predictive results were compared with the real results, which had been obtained from standard experiments. Despite limited amount of data, the result exceeded all expectations. The result of prediction also showcases that the relationship between before and after extraction parameters are complex and non-linear. The proof is R2 value of Multiple Linear Regression and Polynomial Regression is negative, which means the model is broken. However, Random forest and Decision tree give us a good performance. With the same algorithms, we can apply for prediction all geochemical parameters by depth or utilize them for well-logging data.

2019 ◽  
Vol 8 (9) ◽  
pp. 382 ◽  
Author(s):  
Marcos Ruiz-Álvarez ◽  
Francisco Alonso-Sarria ◽  
Francisco Gomariz-Castillo

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machines and Random Forest, are compared with Multiple Linear Regression and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using two different approaches, a leave-one-out cross validation in the spatial domain and a spatio-temporal k-block cross-validation, and four different statistics on a daily basis, allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest produces the best results (R 2 = 0.888 ± 0.026, Root mean square error = 3.01 ± 0.325 using k-block cross-validation). Regression methods (Support Vector Machine, Random Forest and Multiple Linear Regression) are calibrated with MODIS data and several predictors easily calculated from a Digital Elevation Model. The most important variables in the Random Forest model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.


The study of pricing factors in the market of the short-term rental has been done. Airbnb was chosen as the object of the study; it is a platform for accommodation, search, and rental around the world. At the beginning of 2021, the company offers 7 million homes from more than 220 countries. The Data Science methods play a significant role in the company's success. One of the key algorithms of the company is the pricing algorithm. Using the "Price Recommendations" feature, the homeowner can analyze which dates are most likely to be booked at the current price and which are not, it helps form a favorable offer. The system calculates the recommended cost of housing based on hundreds of parameters, some of which are easy to recognize, but there are less obvious factors that can also affect demand. The paper proposes an algorithm for identifying implicit pricing factors in the short-term rental market using machine learning methods, which includes: 1) data mining and data preparation; 2) building and analysis of linear regression models; 3) building and analysis of nonlinear regression models. The study was based on ads from the Airbnb site in Washington and New York using scripts developed in Python. The following models are built and analyzed: simple linear regression, multiple linear regression, polynomial regression, decision trees, random forest, and boosting. The results of the study showed that the most important factors are accommodates, cleaning_fee, room_type, bedrooms. But based on the model evaluation criteria, they cannot be used for implementation: linear models are of low quality, while the random forest, boosting, and trees are overfitted. Still the results can be used in conducting business analysis.


2020 ◽  
Vol 17 (9) ◽  
pp. 4280-4286
Author(s):  
G. L. Anoop ◽  
C. Nandini

Agriculture and allied production contributes to Indian economy and food security of India. Crop yield predictive model will help farmers and agriculture department and organization to take better decisions. In this paper we are proposingmulti-level, machine learning algorithms to predict rice crop yield. Here, data were collected from Indian Government website for 4 districts of Karnataka, i.e., Mysore, Mandya Raichur and Koppal, these data were publically available. In our proposed method initially, we have performed data pre-processing using z-score, normalization and Standardizing residuals on collected data, then multilevel decision tree and multilevel multiple linear regression methods are presented to predict the rice crop yield and evaluated the performance of both. The experimental results shows that the multiple linear regression is accurate than the decision tree technique. This prediction will guide the farmer to make better decision to gain better yield and for their livelihood in particular temperature or climatic scenario.


Author(s):  
A Lakshmanarao ◽  
M Raja Babu ◽  
T Srinivasa Ravi Kiran

<p>The whole world is experiencing a novel infection called Coronavirus brought about by a Covid since 2019. The main concern about this disease is the absence of proficient authentic medicine The World Health Organization (WHO) proposed a few precautionary measures to manage the spread of illness and to lessen the defilement in this manner decreasing cases. In this paper, we analyzed the Coronavirus dataset accessible in Kaggle. The past contributions from a few researchers of comparative work covered a limited number of days. Our paper used the covid19 data till May 2021. The number of confirmed cases, recovered cases, and death cases are considered for analysis. The corona cases are analyzed in a daily, weekly manner to get insight into the dataset. After extensive analysis, we proposed machine learning regressors for covid 19 predictions. We applied linear regression, polynomial regression, Decision Tree Regressor, Random Forest Regressor. Decision Tree and Random Forest given an r-square value of 0.99. We also predicted future cases with these four algorithms. We can able to predict future cases better with the polynomial regression technique. This prediction can help to take preventive measures to control covid19 in near future. All the experiments are conducted with python language</p>


Longevity depends on various facets such as economic growth of the country, along with the health innovations of the region. Along with the prophecy of existence, we also figure out how sensitive a particular mainland is to few chronic diseases. These factors have a robust impact on the potential life span of the population. We study the biological and economical aspects of continents and their countries to predict the life expectancy of the population and to perceive the probability of the continent possessing long standing diseases like measles, HIV/AIDS, etc. Our research is conducted on the theory that exhibits the dependency or correlation of life expectancy with the various factors which includes the health factors as well as the economic factors. Two Machine learning algorithms simple linear regression, multiple linear regression are used for predicting the expectancy of life over different continents, whereas, decision tree algorithm, random forest algorithm, and were applied to classify the likelihood of occurrence of the disease. On comparing and contrasting various algorithms, we can infer that, multiple linear regression produces the most accurate results as to what the average life expectancy of the population would be given the current features of the continent like the adult mortality rate, alcohol consumption rate, infant deaths, the GDP of the country, average percentage expenditure of the population on health care and treatments, schooling rate, and other such features. On the other hand, we study five diseases namely, HIV/AIDS, measles, diphtheria, hepatitis B and polio. The experiment concluded that, on majority, random forest produces better results of classification based on the economic factors of the combination of various countries of different continents


2020 ◽  
Vol 214 ◽  
pp. 02050
Author(s):  
Zhen Sun ◽  
Shangmei Zhao

This paper analyzed and compared the forecast effect of three machine learning algorithms (multiple linear regression, random forest and LSTM network) in stock price forecast using the closing price data of NASDAQ ETF and data of statistical factors. The test results show that the prediction effect of the closing price data is better than that of statistical factors, but the difference is not significant. Multiple linear regression is most suitable for stock price forecast. The second is random forest, which is prone to overfitting. The forecast effect of LSTM network is the worst and the values of RMSE and MAPE were the highest. The forecast effect of future stock price using closing price of NASDAQ ETF is better than that using statistical factors, but the difference is not significant.


Author(s):  
Nebojša M. Jurišević ◽  
◽  
Dušan R. Gordić ◽  
Vladimir Vukašinović ◽  
Arso M. Vukicevic ◽  
...  

Preschool buildings are among the biggest water consumers in the public buildings sector, which efficient management of water consumption could make considerable savings in city budgets. The aim of this study was twofold: 1) to assess prognostic performances of 21 parameters that influence the water consumption and 2) to assess performances of two different approaches (statistical and machine learning-based) with 6 various predictive models for the estimation of water consumption by using the observed parameters. The considered data set was collected from the total share of public preschool buildings in the city of Kragujevac, Serbia, over a three-year period. Top-performing statistical-based model was Multiple Linear Regression, while the best machine learning method was Random Forest. Particularly, Random Forest gained the best overall performances while the Multiple linear regression showed the same precision as the Random Forest when dealing with buildings that consume more than 200 m3/month. It is found that both methods provide satisfying estimates, leaving for potential users to choose between better performances (Random Forest) or usability (Multiple Linear Regression).


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Author(s):  
Sara LIFSHITS

ABSTRACT Hydrocarbon migration mechanism into a reservoir is one of the most controversial in oil and gas geology. The research aimed to study the effect of supercritical carbon dioxide (СО2) on the permeability of sedimentary rocks (carbonates, argillite, oil shale), which was assessed by the yield of chloroform extracts and gas permeability (carbonate, argillite) before and after the treatment of rocks with supercritical СО2. An increase in the permeability of dense potentially oil-source rocks has been noted, which is explained by the dissolution of carbonates to bicarbonates due to the high chemical activity of supercritical СО2 and water dissolved in it. Similarly, in geological processes, the introduction of deep supercritical fluid into sedimentary rocks can increase the permeability and, possibly, the porosity of rocks, which will facilitate the primary migration of hydrocarbons and improve the reservoir properties of the rocks. The considered mechanism of hydrocarbon migration in the flow of deep supercritical fluid makes it possible to revise the time and duration of the formation of gas–oil deposits decreasingly, as well as to explain features in the formation of various sources of hydrocarbons and observed inflow of oil into operating and exhausted wells.


Sign in / Sign up

Export Citation Format

Share Document