Application of Machine Learning Algorithms in Predicting Pyrolytic Analysis Result

Abstract Geochemical studies of organic matter in source rocks play an important role in predicting the oil and gas accumulation of any territory, especially in oil and gas shale. For deep understanding, pyrolytic analyses are often carried out on samples before and after extraction of hydrocarbon with chloroform. However, extraction is a laborious and time-consuming process and the workload of laboratory equipment and time doubles. In this work, machine learning regression algorithms is applied for forecasting S2ex based on the pyrolytic analytic result of non-extracted samples. This study is carried out using more than 300 samples from 3 different wells in Bazhenov formation, Western Siberia. For developing a prediction model, 5 different machine learning regression algorithms including Multiple Linear Regression, Polynomial Regression, Support vector regression, Decision tree and Random forest have been tested and compared. The performance of these algorithms is examined by R-squared coefficient. The data of the X2 well was used for building a model. Simultaneously, this data is divided into 2 parts – 80% for training and 20% for checking. The model also was used for prediction of wells X1 and X3. Then, these predictive results were compared with the real results, which had been obtained from standard experiments. Despite limited amount of data, the result exceeded all expectations. The result of prediction also showcases that the relationship between before and after extraction parameters are complex and non-linear. The proof is R2 value of Multiple Linear Regression and Polynomial Regression is negative, which means the model is broken. However, Random forest and Decision tree give us a good performance. With the same algorithms, we can apply for prediction all geochemical parameters by depth or utilize them for well-logging data.

Download Full-text

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8090382 ◽

2019 ◽

Vol 8 (9) ◽

pp. 382 ◽

Cited By ~ 2

Author(s):

Marcos Ruiz-Álvarez ◽

Francisco Alonso-Sarria ◽

Francisco Gomariz-Castillo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Air Temperature ◽

Cross Validation ◽

Daily Basis ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machines and Random Forest, are compared with Multiple Linear Regression and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using two different approaches, a leave-one-out cross validation in the spatial domain and a spatio-temporal k-block cross-validation, and four different statistics on a daily basis, allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest produces the best results (R 2 = 0.888 ± 0.026, Root mean square error = 3.01 ± 0.325 using k-block cross-validation). Regression methods (Support Vector Machine, Random Forest and Multiple Linear Regression) are calibrated with MODIS data and several predictors easily calculated from a Digital Elevation Model. The most important variables in the Random Forest model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text

FORECASTING PRICES IN THE RENTAL HOUSING MARKET WITH MACHINE LEARNING METHODS

Bulletin of V. N. Karazin Kharkiv National University Economic Series ◽

10.26565/2311-2379-2020-99-12 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Regression Models ◽

Data Science ◽

Polynomial Regression ◽

Short Term ◽

Learning Methods ◽

Machine Learning Methods ◽

Pricing Factors

The study of pricing factors in the market of the short-term rental has been done. Airbnb was chosen as the object of the study; it is a platform for accommodation, search, and rental around the world. At the beginning of 2021, the company offers 7 million homes from more than 220 countries. The Data Science methods play a significant role in the company's success. One of the key algorithms of the company is the pricing algorithm. Using the "Price Recommendations" feature, the homeowner can analyze which dates are most likely to be booked at the current price and which are not, it helps form a favorable offer. The system calculates the recommended cost of housing based on hundreds of parameters, some of which are easy to recognize, but there are less obvious factors that can also affect demand. The paper proposes an algorithm for identifying implicit pricing factors in the short-term rental market using machine learning methods, which includes: 1) data mining and data preparation; 2) building and analysis of linear regression models; 3) building and analysis of nonlinear regression models. The study was based on ads from the Airbnb site in Washington and New York using scripts developed in Python. The following models are built and analyzed: simple linear regression, multiple linear regression, polynomial regression, decision trees, random forest, and boosting. The results of the study showed that the most important factors are accommodates, cleaning_fee, room_type, bedrooms. But based on the model evaluation criteria, they cannot be used for implementation: linear models are of low quality, while the random forest, boosting, and trees are overfitted. Still the results can be used in conducting business analysis.

Download Full-text

Rice Crop Yield Prediction Using Multi-Level Machine Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9062 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4280-4286

Author(s):

G. L. Anoop ◽

C. Nandini

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Decision Tree ◽

Multiple Linear Regression ◽

Crop Yield ◽

Machine Learning Algorithms ◽

Rice Crop ◽

Machine Learning Techniques ◽

Indian Government ◽

Regression Methods

Agriculture and allied production contributes to Indian economy and food security of India. Crop yield predictive model will help farmers and agriculture department and organization to take better decisions. In this paper we are proposingmulti-level, machine learning algorithms to predict rice crop yield. Here, data were collected from Indian Government website for 4 districts of Karnataka, i.e., Mysore, Mandya Raichur and Koppal, these data were publically available. In our proposed method initially, we have performed data pre-processing using z-score, normalization and Standardizing residuals on collected data, then multilevel decision tree and multilevel multiple linear regression methods are presented to predict the rice crop yield and evaluated the performance of both. The experimental results shows that the multiple linear regression is accurate than the decision tree technique. This prediction will guide the farmer to make better decision to gain better yield and for their livelihood in particular temperature or climatic scenario.

Download Full-text

An Efficient Covid19 Epidemic Analysis and Prediction Model Using Machine Learning Algorithms

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i11.25209 ◽

2021 ◽

Vol 17 (11) ◽

pp. 176

Author(s):

A Lakshmanarao ◽

M Raja Babu ◽

T Srinivasa Ravi Kiran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Polynomial Regression ◽

Machine Learning Algorithms ◽

World Health ◽

Main Concern ◽

Python Language ◽

Precautionary Measures ◽

Health Organization

<p>The whole world is experiencing a novel infection called Coronavirus brought about by a Covid since 2019. The main concern about this disease is the absence of proficient authentic medicine The World Health Organization (WHO) proposed a few precautionary measures to manage the spread of illness and to lessen the defilement in this manner decreasing cases. In this paper, we analyzed the Coronavirus dataset accessible in Kaggle. The past contributions from a few researchers of comparative work covered a limited number of days. Our paper used the covid19 data till May 2021. The number of confirmed cases, recovered cases, and death cases are considered for analysis. The corona cases are analyzed in a daily, weekly manner to get insight into the dataset. After extensive analysis, we proposed machine learning regressors for covid 19 predictions. We applied linear regression, polynomial regression, Decision Tree Regressor, Random Forest Regressor. Decision Tree and Random Forest given an r-square value of 0.99. We also predicted future cases with these four algorithms. We can able to predict future cases better with the polynomial regression technique. This prediction can help to take preventive measures to control covid19 in near future. All the experiments are conducted with python language</p>

Download Full-text

Machine Learning For Prognosis of Life Expectancy and Diseases

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9156.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1765-1771

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Life Expectancy ◽

Multiple Linear Regression ◽

Machine Learning Algorithms ◽

Economic Factors ◽

Average Life Expectancy ◽

Average Percentage ◽

Hiv Aids

Longevity depends on various facets such as economic growth of the country, along with the health innovations of the region. Along with the prophecy of existence, we also figure out how sensitive a particular mainland is to few chronic diseases. These factors have a robust impact on the potential life span of the population. We study the biological and economical aspects of continents and their countries to predict the life expectancy of the population and to perceive the probability of the continent possessing long standing diseases like measles, HIV/AIDS, etc. Our research is conducted on the theory that exhibits the dependency or correlation of life expectancy with the various factors which includes the health factors as well as the economic factors. Two Machine learning algorithms simple linear regression, multiple linear regression are used for predicting the expectancy of life over different continents, whereas, decision tree algorithm, random forest algorithm, and were applied to classify the likelihood of occurrence of the disease. On comparing and contrasting various algorithms, we can infer that, multiple linear regression produces the most accurate results as to what the average life expectancy of the population would be given the current features of the continent like the adult mortality rate, alcohol consumption rate, infant deaths, the GDP of the country, average percentage expenditure of the population on health care and treatments, schooling rate, and other such features. On the other hand, we study five diseases namely, HIV/AIDS, measles, diphtheria, hepatitis B and polio. The experiment concluded that, on majority, random forest produces better results of classification based on the economic factors of the combination of various countries of different continents

Download Full-text

PREDICTIVE MAINTENANCE OF SINGLE PHASE AC MOTOR USING IOT SENSOR DATA AND MACHINE LEARNING (SIMPLE LINEAR REGRESSION AND MULTIPLE LINEAR REGRESSION ALGORITHMS)

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2019.v04i04.022 ◽

2019 ◽

Vol 04 (04) ◽

pp. 128-135

Author(s):

Jakra A. Husain ◽

Ashish Manusmare

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Multiple Linear Regression ◽

Single Phase ◽

Predictive Maintenance ◽

Sensor Data ◽

Simple Linear Regression ◽

Ac Motor ◽

Regression Algorithms

Download Full-text

Machine Learning in Stock Price Forecast

E3S Web of Conferences ◽

10.1051/e3sconf/202021402050 ◽

2020 ◽

Vol 214 ◽

pp. 02050

Author(s):

Zhen Sun ◽

Shangmei Zhao

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Stock Price ◽

Price Forecast ◽

Closing Price ◽

The Difference ◽

Lstm Network ◽

Better Than

This paper analyzed and compared the forecast effect of three machine learning algorithms (multiple linear regression, random forest and LSTM network) in stock price forecast using the closing price data of NASDAQ ETF and data of statistical factors. The test results show that the prediction effect of the closing price data is better than that of statistical factors, but the difference is not significant. Multiple linear regression is most suitable for stock price forecast. The second is random forest, which is prone to overfitting. The forecast effect of LSTM network is the worst and the values of RMSE and MAPE were the highest. The forecast effect of future stock price using closing price of NASDAQ ETF is better than that using statistical factors, but the difference is not significant.

Download Full-text

Assessment of predictive models for estimation of water consumption in public preschool buildings

Journal of Engineering Research ◽

10.36909/jer.10941 ◽

2021 ◽

Author(s):

Nebojša M. Jurišević ◽

◽

Dušan R. Gordić ◽

Vladimir Vukašinović ◽

Arso M. Vukicevic ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Water Consumption ◽

Predictive Models ◽

Data Set ◽

The Public ◽

Public Preschool ◽

Efficient Management

Preschool buildings are among the biggest water consumers in the public buildings sector, which efficient management of water consumption could make considerable savings in city budgets. The aim of this study was twofold: 1) to assess prognostic performances of 21 parameters that influence the water consumption and 2) to assess performances of two different approaches (statistical and machine learning-based) with 6 various predictive models for the estimation of water consumption by using the observed parameters. The considered data set was collected from the total share of public preschool buildings in the city of Kragujevac, Serbia, over a three-year period. Top-performing statistical-based model was Multiple Linear Regression, while the best machine learning method was Random Forest. Particularly, Random Forest gained the best overall performances while the Multiple linear regression showed the same precision as the Random Forest when dealing with buildings that consume more than 200 m3/month. It is found that both methods provide satisfying estimates, leaving for potential users to choose between better performances (Random Forest) or usability (Multiple Linear Regression).

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Deep fluids and their role in hydrocarbon migration and oil deposit formation exemplified by supercritical СO2

Earth and Environmental Science Transactions of the Royal Society of Edinburgh ◽

10.1017/s1755691021000013 ◽

2021 ◽

pp. 1-11

Author(s):

Sara LIFSHITS

Keyword(s):

Supercritical Fluid ◽

Oil And Gas ◽

Gas Permeability ◽

Sedimentary Rocks ◽

Source Rocks ◽

Reservoir Properties ◽

Hydrocarbon Migration ◽

Geological Processes ◽

Before And After ◽

Oil Source

ABSTRACT Hydrocarbon migration mechanism into a reservoir is one of the most controversial in oil and gas geology. The research aimed to study the effect of supercritical carbon dioxide (СО2) on the permeability of sedimentary rocks (carbonates, argillite, oil shale), which was assessed by the yield of chloroform extracts and gas permeability (carbonate, argillite) before and after the treatment of rocks with supercritical СО2. An increase in the permeability of dense potentially oil-source rocks has been noted, which is explained by the dissolution of carbonates to bicarbonates due to the high chemical activity of supercritical СО2 and water dissolved in it. Similarly, in geological processes, the introduction of deep supercritical fluid into sedimentary rocks can increase the permeability and, possibly, the porosity of rocks, which will facilitate the primary migration of hydrocarbons and improve the reservoir properties of the rocks. The considered mechanism of hydrocarbon migration in the flow of deep supercritical fluid makes it possible to revise the time and duration of the formation of gas–oil deposits decreasingly, as well as to explain features in the formation of various sources of hydrocarbons and observed inflow of oil into operating and exhausted wells.

Download Full-text