random forest regression
Recently Published Documents


TOTAL DOCUMENTS

383
(FIVE YEARS 119)

H-INDEX

27
(FIVE YEARS 0)

2022 ◽  
Vol 14 (2) ◽  
pp. 394
Author(s):  
Dan Li ◽  
Yuxin Miao ◽  
Curtis J. Ransom ◽  
G. Mac Bean ◽  
Newell R. Kitchen ◽  
...  

Accurate nitrogen (N) diagnosis early in the growing season across diverse soil, weather, and management conditions is challenging. Strategies using multi-source data are hypothesized to perform significantly better than approaches using crop sensing information alone. The objective of this study was to evaluate, across diverse environments, the potential for integrating genetic (e.g., comparative relative maturity and growing degree units to key developmental growth stages), environmental (e.g., soil and weather), and management (e.g., seeding rate, irrigation, previous crop, and preplant N rate) information with active canopy sensor data for improved corn N nutrition index (NNI) prediction using machine learning methods. Thirteen site-year corn (Zea mays L.) N rate experiments involving eight N treatments conducted in four US Midwest states in 2015 and 2016 were used for this study. A proximal RapidSCAN CS-45 active canopy sensor was used to collect corn canopy reflectance data around the V9 developmental growth stage. The utility of vegetation indices and ancillary data for predicting corn aboveground biomass, plant N concentration, plant N uptake, and NNI was evaluated using singular variable regression and machine learning methods. The results indicated that when the genetic, environmental, and management data were used together with the active canopy sensor data, corn N status indicators could be more reliably predicted either using support vector regression (R2 = 0.74–0.90 for prediction) or random forest regression models (R2 = 0.84–0.93 for prediction), as compared with using the best-performing single vegetation index or using a normalized difference vegetation index (NDVI) and normalized difference red edge (NDRE) together (R2 < 0.30). The N diagnostic accuracy based on the NNI was 87% using the data fusion approach with random forest regression (kappa statistic = 0.75), which was better than the result of a support vector regression model using the same inputs. The NDRE index was consistently ranked as the most important variable for predicting all the four corn N status indicators, followed by the preplant N rate. It is concluded that incorporating genetic, environmental, and management information with canopy sensing data can significantly improve in-season corn N status prediction and diagnosis across diverse soil and weather conditions.



2022 ◽  
Vol 2022 ◽  
pp. 1-8
Author(s):  
Xiaoying Lv ◽  
Ruonan Zhao ◽  
Tongsheng Su ◽  
Liyun He ◽  
Rui Song ◽  
...  

Objective. To explore the optimal fitting path of missing data of the Scale to make the fitting data close to the real situation of patients’ data. Methods. Based on the complete data set of the SDS of 507 patients with stroke, the data simulation sets of Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR) were constructed by R software, respectively, with missing rates of 5%, 10%, 15%, 20%, 25%, 30%, 35%, and 40% under three missing mechanisms. Mean substitution (MS), random forest regression (RFR), and predictive mean matching (PMM) were used to fit the data. Root mean square error (RMSE), the width of 95% confidence intervals (95% CI), and Spearman correlation coefficient (SCC) were used to evaluate the fitting effect and determine the optimal fitting path. Results. when dealing with the problem of missing data in scales, the optimal fitting path is ① under the MCAR deletion mechanism, when the deletion proportion is less than 20%, the MS method is the most convenient; when the missing ratio is greater than 20%, RFR algorithm is the best fitting method. ② Under the Mar mechanism, when the deletion ratio is less than 35%, the MS method is the most convenient. When the deletion ratio is greater than 35%, RFR has a better correlation. ③ Under the mechanism of MNAR, RFR is the best data fitting method, especially when the missing proportion is greater than 30%. In reality, when the deletion ratio is small, the complete case deletion method is the most commonly used, but the RFR algorithm can greatly expand the application scope of samples and save the cost of clinical research when the deletion ratio is less than 30%. The best way to deal with data missing should be based on the missing mechanism and proportion of actual data, and choose the best method between the statistical analysis ability of the research team, the effectiveness of the method, and the understanding of readers.



Complexity ◽  
2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Marium Mehmood ◽  
Nasser Alshammari ◽  
Saad Awadh Alanazi ◽  
Fahad Ahmad

The liver is the human body’s mandatory organ, but detecting liver disease at an early stage is very difficult due to the hiddenness of symptoms. Liver diseases may cause loss of energy or weakness when some irregularities in the working of the liver get visible. Cancer is one of the most common diseases of the liver and also the most fatal of all. Uncontrolled growth of harmful cells is developed inside the liver. If diagnosed late, it may cause death. Treatment of liver diseases at an early stage is, therefore, an important issue as is designing a model to diagnose early disease. Firstly, an appropriate feature should be identified which plays a more significant part in the detection of liver cancer at an early stage. Therefore, it is essential to extract some essential features from thousands of unwanted features. So, these features will be mined using data mining and soft computing techniques. These techniques give optimized results that will be helpful in disease diagnosis at an early stage. In these techniques, we use feature selection methods to reduce the dataset’s feature, which include Filter, Wrapper, and Embedded methods. Different Regression algorithms are then applied to these methods individually to evaluate the result. Regression algorithms include Linear Regression, Ridge Regression, LASSO Regression, Support Vector Regression, Decision Tree Regression, Multilayer Perceptron Regression, and Random Forest Regression. Based on the accuracy and error rates generated by these Regression algorithms, we have evaluated our results. The result shows that Random Forest Regression with the Wrapper Method from all the deployed Regression techniques is the best and gives the highest R2-Score of 0.8923 and lowest MSE of 0.0618.



The article aims to develop a model for forecasting the characteristics of traffic flows in real-time based on the classification of applications using machine learning methods to ensure the quality of service. It is shown that the model can forecast the mean rate and frequency of packet arrival for the entire flow of each class separately. The prediction is based on information about the previous flows of this class and the first 15 packets of the active flow. Thus, the Random Forest Regression method reduces the prediction error by approximately 1.5 times compared to the standard mean estimate for transmitted packets issued at the switch interface.



2022 ◽  
Vol 2161 (1) ◽  
pp. 012053
Author(s):  
B P Ashwini ◽  
R Sumathi ◽  
H S Sudhira

Abstract Congested roads are a global problem, and increased usage of private vehicles is one of the main reasons for congestion. Public transit modes of travel are a sustainable and eco-friendly alternative for private vehicle usage, but attracting commuters towards public transit mode is a mammoth task. Commuters expect the public transit service to be reliable, and to provide a reliable service it is necessary to fine-tune the transit operations and provide well-timed necessary information to commuters. In this context, the public transit travel time is predicted in Tumakuru, a tier-2 city of Karnataka, India. As this is one of the initial studies in the city, the performance comparison of eight Machines Learning models including four linear namely, Linear Regression, Ridge Regression, Least Absolute Shrinkage and Selection Operator Regression, and Support Vector Regression; and four non-linear models namely, k-Nearest Neighbors, Regression Trees, Random Forest Regression, and Gradient Boosting Regression Trees is conducted to identify a suitable model for travel time predictions. The data logs of one month (November 2020) of the Tumakuru city service, provided by Tumakuru Smart City Limited are used for the study. The time-of-the-day (trip start time), day-of-the-week, and direction of travel are used for the prediction. Travel time for both upstream and downstream are predicted, and the results are evaluated based on the performance metrics. The results suggest that the performance of non-linear models is superior to linear models for predicting travel times, and Random Forest Regression was found to be a better model as compared to other models.



2021 ◽  
Vol 68 (3) ◽  
pp. 1-15
Author(s):  
Sylwester Bejger ◽  
Piotr Fiszeder

We combine machine learning tree-based algorithms with the usage of low and high prices and suggest a new approach to forecasting currency covariances. We apply three algorithms: Random Forest Regression, Gradient Boosting Regression Trees and Extreme Gradient Boosting with a tree learner. We conduct an empirical evaluation of this procedure on the three most heavily traded currency pairs in the Forex market: EUR/USD, USD/JPY and GBP/USD. The forecasts of covariances formulated on the three applied algorithms are predominantly more accurate than the Dynamic Conditional Correlation model based on closing prices. The results of the analyses indicate that the GBRT algorithm is the bestperforming method.



2021 ◽  
Vol 12 (1) ◽  
pp. 282
Author(s):  
Andrew Rodger ◽  
Carsten Laukamp

The efficacy of predicting geochemical parameters with a 2-chain workflow using spectral data as the initial input is evaluated. Spectral measurements spanning the approximate 400–25000 nm spectral range are used to train a workflow consisting of a non-negative matrix function (NMF) step, for data reduction, and a random forest regression (RFR) to predict eight geochemical parameters. Approximately 175,000 spectra with their corresponding chemical analysis were available for training, testing and validation purposes. The samples and their spectral and chemical parameters represent 9399 drillcore. Of those, approximately 20,000 spectra and their accompanying analysis were used for training and 5000 for model validation. The remaining pairwise data (150,000 samples) were used for testing of the method. The data are distributed over two large spatial extents (980 km2 and 3025 km2, respectively) and allowed the proposed method to be tested against samples that are spatially distant from the initial training points. Global R2 scores and wt.% RMSE on the 150,000 validation samples are Fe (0.95/3.01), SiO2 (0.96/3.77), Al2O3 (0.92/1.27), TiO (0.68/0.13), CaO (0.89/0.41), MgO (0.87/0.35), K2O (0.65/0.21) and LOI (0.90/1.14), given as Parameter (R2/RMSE), and demonstrate that the proposed method is capable of predicting the eight parameters and is stable enough, in the environment tested, to extend beyond the training sets initial spatial location.



Author(s):  
Abhishek Sharma ◽  
Shubham Sharma

Hand gesture is language through which normal people can communicate with deaf and dumb people. Hand gesture recognition detects the hand pose and converts it to the corresponding alphabet or sentence. In past years it received great attention from society because of its application. It uses machine learning algorithms. Hand gesture recognition is a great application of human computer interaction. An emerging research field that is based on human centered computing aims to understand human gestures and integrate users and their social context with computer systems. One of the unique and challenging applications in this framework is to collect information about human dynamic gestures. Keywords: Covid-19, SIRD model, Linear Regression, XGBoost, Random Forest Regression, SVR, LightGBM, Machine learning, Intervention.



Sign in / Sign up

Export Citation Format

Share Document