scholarly journals Machine Learning in Stock Price Forecast

2020 ◽  
Vol 214 ◽  
pp. 02050
Author(s):  
Zhen Sun ◽  
Shangmei Zhao

This paper analyzed and compared the forecast effect of three machine learning algorithms (multiple linear regression, random forest and LSTM network) in stock price forecast using the closing price data of NASDAQ ETF and data of statistical factors. The test results show that the prediction effect of the closing price data is better than that of statistical factors, but the difference is not significant. Multiple linear regression is most suitable for stock price forecast. The second is random forest, which is prone to overfitting. The forecast effect of LSTM network is the worst and the values of RMSE and MAPE were the highest. The forecast effect of future stock price using closing price of NASDAQ ETF is better than that using statistical factors, but the difference is not significant.

2021 ◽  
Vol 931 (1) ◽  
pp. 012013
Author(s):  
Le Thi Nhut Suong ◽  
A V Bondarev ◽  
E V Kozlova

Abstract Geochemical studies of organic matter in source rocks play an important role in predicting the oil and gas accumulation of any territory, especially in oil and gas shale. For deep understanding, pyrolytic analyses are often carried out on samples before and after extraction of hydrocarbon with chloroform. However, extraction is a laborious and time-consuming process and the workload of laboratory equipment and time doubles. In this work, machine learning regression algorithms is applied for forecasting S2ex based on the pyrolytic analytic result of non-extracted samples. This study is carried out using more than 300 samples from 3 different wells in Bazhenov formation, Western Siberia. For developing a prediction model, 5 different machine learning regression algorithms including Multiple Linear Regression, Polynomial Regression, Support vector regression, Decision tree and Random forest have been tested and compared. The performance of these algorithms is examined by R-squared coefficient. The data of the X2 well was used for building a model. Simultaneously, this data is divided into 2 parts – 80% for training and 20% for checking. The model also was used for prediction of wells X1 and X3. Then, these predictive results were compared with the real results, which had been obtained from standard experiments. Despite limited amount of data, the result exceeded all expectations. The result of prediction also showcases that the relationship between before and after extraction parameters are complex and non-linear. The proof is R2 value of Multiple Linear Regression and Polynomial Regression is negative, which means the model is broken. However, Random forest and Decision tree give us a good performance. With the same algorithms, we can apply for prediction all geochemical parameters by depth or utilize them for well-logging data.


2019 ◽  
Vol 8 (9) ◽  
pp. 382 ◽  
Author(s):  
Marcos Ruiz-Álvarez ◽  
Francisco Alonso-Sarria ◽  
Francisco Gomariz-Castillo

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machines and Random Forest, are compared with Multiple Linear Regression and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using two different approaches, a leave-one-out cross validation in the spatial domain and a spatio-temporal k-block cross-validation, and four different statistics on a daily basis, allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest produces the best results (R 2 = 0.888 ± 0.026, Root mean square error = 3.01 ± 0.325 using k-block cross-validation). Regression methods (Support Vector Machine, Random Forest and Multiple Linear Regression) are calibrated with MODIS data and several predictors easily calculated from a Digital Elevation Model. The most important variables in the Random Forest model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.


Longevity depends on various facets such as economic growth of the country, along with the health innovations of the region. Along with the prophecy of existence, we also figure out how sensitive a particular mainland is to few chronic diseases. These factors have a robust impact on the potential life span of the population. We study the biological and economical aspects of continents and their countries to predict the life expectancy of the population and to perceive the probability of the continent possessing long standing diseases like measles, HIV/AIDS, etc. Our research is conducted on the theory that exhibits the dependency or correlation of life expectancy with the various factors which includes the health factors as well as the economic factors. Two Machine learning algorithms simple linear regression, multiple linear regression are used for predicting the expectancy of life over different continents, whereas, decision tree algorithm, random forest algorithm, and were applied to classify the likelihood of occurrence of the disease. On comparing and contrasting various algorithms, we can infer that, multiple linear regression produces the most accurate results as to what the average life expectancy of the population would be given the current features of the continent like the adult mortality rate, alcohol consumption rate, infant deaths, the GDP of the country, average percentage expenditure of the population on health care and treatments, schooling rate, and other such features. On the other hand, we study five diseases namely, HIV/AIDS, measles, diphtheria, hepatitis B and polio. The experiment concluded that, on majority, random forest produces better results of classification based on the economic factors of the combination of various countries of different continents


Author(s):  
Nebojša M. Jurišević ◽  
◽  
Dušan R. Gordić ◽  
Vladimir Vukašinović ◽  
Arso M. Vukicevic ◽  
...  

Preschool buildings are among the biggest water consumers in the public buildings sector, which efficient management of water consumption could make considerable savings in city budgets. The aim of this study was twofold: 1) to assess prognostic performances of 21 parameters that influence the water consumption and 2) to assess performances of two different approaches (statistical and machine learning-based) with 6 various predictive models for the estimation of water consumption by using the observed parameters. The considered data set was collected from the total share of public preschool buildings in the city of Kragujevac, Serbia, over a three-year period. Top-performing statistical-based model was Multiple Linear Regression, while the best machine learning method was Random Forest. Particularly, Random Forest gained the best overall performances while the Multiple linear regression showed the same precision as the Random Forest when dealing with buildings that consume more than 200 m3/month. It is found that both methods provide satisfying estimates, leaving for potential users to choose between better performances (Random Forest) or usability (Multiple Linear Regression).


Author(s):  
Muhammad Rois Rois ◽  
Manarotul Fatati Fatati ◽  
Winda Ihda Magfiroh

This study aims to determine the effect of Inflation, Exchange Rate and Composite Stock Price Index (IHSG) to Return of PT Nikko Securities Indonesia Stock Fund period 2014-2017. The study used secondary data obtained through documentation in the form of PT Nikko Securities Indonesia Monthly Net Asset (NAB) report. Data analysis is used with quantitative analysis, multiple linear regression analysis using eviews 9. Population and sample in this research are PT Nikko Securities Indonesia. The result of multiple linear regression analysis was the coefficient of determination (R2) showed the result of 0.123819 or 12%. This means that the Inflation, Exchange Rate and Composite Stock Price Index (IHSG) variables can influence the return of PT Nikko Securities Indonesia's equity fund of 12% and 88% is influenced by other variables. Based on the result of the research, the variables of inflation and exchange rate have a negative and significant effect toward the return of PT Nikko Securities Indonesia's equity fund. While the variable of Composite Stock Price Index (IHSG) has a negative but not significant effect toward Return of Equity Fund of PT Nikko Securities Indonesia


Author(s):  
Giulia Seghezzo ◽  
Yvonne Van Hoecke ◽  
Laura James ◽  
Donna Davoren ◽  
Elizabeth Williamson ◽  
...  

Abstract Background The Preclinical Alzheimer Cognitive Composite (PACC) is a composite score which can detect the first signs of cognitive impairment, which can be of importance for research and clinical practice. It is designed to be administered in person; however, in-person assessments are costly, and are difficult during the current COVID-19 pandemic. Objective To assess the feasibility of performing the PACC assessment with videoconferencing, and to compare the validity of this remote PACC with the in-person PACC obtained previously. Methods Participants from the HEalth and Ageing Data IN the Game of football (HEADING) Study who had already undergone an in-person assessment were re-contacted and re-assessed remotely. The correlation between the two PACC scores was estimated. The difference between the two PACC scores was calculated and used in multiple linear regression to assess which variables were associated with a difference in PACC scores. Findings Of the 43 participants who were invited to this external study, 28 were re-assessed. The median duration in days between the in-person and the remote assessments was 236.5 days (7.9 months) (IQR 62.5). There was a strong positive correlation between the two assessments for the PACC score, with a Pearson correlation coefficient of 0·82 (95% CI 0·66, 0·98). The multiple linear regression found that the only predictor of the PACC difference was the time between assessments. Interpretation This study provides evidence on the feasibility of performing cognitive tests online, with the PACC tests being successfully administered through videoconferencing. This is relevant, especially during times when face-to-face assessments cannot be performed.


Author(s):  
Mert Gülçür ◽  
Ben Whiteside

AbstractThis paper discusses micromanufacturing process quality proxies called “process fingerprints” in micro-injection moulding for establishing in-line quality assurance and machine learning models for Industry 4.0 applications. Process fingerprints that we present in this study are purely physical proxies of the product quality and need tangible rationale regarding their selection criteria such as sensitivity, cost-effectiveness, and robustness. Proposed methods and selection reasons for process fingerprints are also justified by analysing the temporally collected data with respect to the microreplication efficiency. Extracted process fingerprints were also used in a multiple linear regression scenario where they bring actionable insights for creating traceable and cost-effective supervised machine learning models in challenging micro-injection moulding environments. Multiple linear regression model demonstrated %84 accuracy in predicting the quality of the process, which is significant as far as the extreme process conditions and product features are concerned.


2021 ◽  
Author(s):  
Yijun Liu ◽  
Daopin Chen ◽  
Muxin Diao ◽  
Guangyu Xiao ◽  
Jing Yan ◽  
...  

2020 ◽  
Vol 9 (11) ◽  
pp. 654
Author(s):  
Guanwei Zhao ◽  
Muzhuang Yang

Mapping population distribution at fine resolutions with high accuracy is crucial to urban planning and management. This paper takes Guangzhou city as the study area, illustrates the gridded population distribution map by using machine learning methods based on zoning strategy with multisource geospatial data such as night light remote sensing data, point of interest data, land use data, and so on. The street-level accuracy evaluation results show that the proposed approach achieved good overall accuracy, with determinant coefficient (R2) being 0.713 and root mean square error (RMSE) being 5512.9. Meanwhile, the goodness of fit for single linear regression (LR) model and random forest (RF) regression model are 0.0039 and 0.605, respectively. For dense area, the accuracy of the random forest model is better than the linear regression model, while for sparse area, the accuracy of the linear regression model is better than the random forest model. The results indicated that the proposed method has great potential in fine-scale population mapping. Therefore, it is advised that the zonal modeling strategy should be the primary choice for solving regional differences in the population distribution mapping research.


Sign in / Sign up

Export Citation Format

Share Document