target variable
Recently Published Documents


TOTAL DOCUMENTS

116
(FIVE YEARS 64)

H-INDEX

8
(FIVE YEARS 2)

Author(s):  
Colin Daly

AbstractAn algorithm for non-stationary spatial modelling using multiple secondary variables is developed herein, which combines geostatistics with quantile random forests to provide a new interpolation and stochastic simulation. This paper introduces the method and shows that its results are consistent and similar in nature to those applying to geostatistical modelling and to quantile random forests. The method allows for embedding of simpler interpolation techniques, such as kriging, to further condition the model. The algorithm works by estimating a conditional distribution for the target variable at each target location. The family of such distributions is called the envelope of the target variable. From this, it is possible to obtain spatial estimates, quantiles and uncertainty. An algorithm is also developed to produce conditional simulations from the envelope. As they sample from the envelope, realizations are therefore locally influenced by relative changes of importance of secondary variables, trends and variability.


Author(s):  
Arpit Saxena

Abstract: Whenever we would like to visit a brand new place in delhi -NCR, we often search for the most effective restaurant or the most cost effective restaurant, but of decent quality. For looking of our greatest restaurants we frequently goes for various websites and apps to induce an overall idea of restaurants service. the foremost important criteria for all this is often rating and reviews of the those that have already got experience in these restaurants. People see for rating and compare these restaurants with one another and choose for his or her best. We restrict our data only to Delhi-NCR. This Zomato dataset provides us with enough information in order that one can decide which restaurants is suitable at which place and what kind of food they must serve so as get maximum profit. it's 9552 rows and 22 columns during this dataset. We'd wish to find the most affordable restaurant in Delhi-NCR.We can discuss various relationships between various columns of information sets like between rating and cuisine type , locality and cuisine etc. Since it's a true time data we might start first with data cleaning like cleaning spaces , garbage texts etc , then data exploratory like handling the None values, null values, dropping duplicates and other Transformations then randomization of dataset so analysis. Our target variable is that the "Aggregate Rating" column. We explore the link of the opposite features within the dataset with relevancy Rates. we'll the visualize the relation of all the opposite depend features with relevance our target variable, and hence find the foremost correlated features which effects our target variable. Keywords: Online food delivery, Marketing mix strategies, Competitive analysis, Pre-processing, Data Cleaning, Data Mining, Exploratory data analysis , Classification , Pandas , MatPlotLib.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261131
Author(s):  
Umme Marzia Haque ◽  
Enamul Kabir ◽  
Rasheda Khanam

Background Mental health problems, such as depression in children have far-reaching negative effects on child, family and society as whole. It is necessary to identify the reasons that contribute to this mental illness. Detecting the appropriate signs to anticipate mental illness as depression in children and adolescents is vital in making an early and accurate diagnosis to avoid severe consequences in the future. There has been no research employing machine learning (ML) approaches for depression detection among children and adolescents aged 4–17 years in a precisely constructed high prediction dataset, such as Young Minds Matter (YMM). As a result, our objective is to 1) create a model that can predict depression in children and adolescents aged 4–17 years old, 2) evaluate the results of ML algorithms to determine which one outperforms the others and 3) associate with the related issues of family activities and socioeconomic difficulties that contribute to depression. Methods The YMM, the second Australian Child and Adolescent Survey of Mental Health and Wellbeing 2013–14 has been used as data source in this research. The variables of yes/no value of low correlation with the target variable (depression status) have been eliminated. The Boruta algorithm has been utilized in association with a Random Forest (RF) classifier to extract the most important features for depression detection among the high correlated variables with target variable. The Tree-based Pipeline Optimization Tool (TPOTclassifier) has been used to choose suitable supervised learning models. In the depression detection step, RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) have been used. Results Unhappy, nothing fun, irritable mood, diminished interest, weight loss/gain, insomnia or hypersomnia, psychomotor agitation or retardation, fatigue, thinking or concentration problems or indecisiveness, suicide attempt or plan, presence of any of these five symptoms have been identified as 11 important features to detect depression among children and adolescents. Although model performance varied somewhat, RF outperformed all other algorithms in predicting depressed classes by 99% with 95% accuracy rate and 99% precision rate in 315 milliseconds (ms). Conclusion This RF-based prediction model is more accurate and informative in predicting child and adolescent depression that outperforms in all four confusion matrix performance measures as well as execution duration.


2021 ◽  
Author(s):  
Miguel Abambres ◽  
Cabello A

<p>Artificial Intelligence is a cutting-edge technology expanding very quickly into every industry. It has made its way into structural engineering and it has shown its benefits in predicting structural performance as well as saving modelling and experimenting time. This paper is the first one (out of three) of a broader research where artificial intelligence was applied to the stability and dynamic analyzes of steel grid-shells. In that study, three Artificial Neural Networks (ANN) with 8 inputs were independently designed for the prediction of a single target variable, namely: (i) the critical buckling factor for uniform loading (i.e. over the entire roof), (ii) the critical buckling factor for uniform loading over half of the roof, and (iii) the fundamental frequency of the structure. This paper addresses target variable (i). The ANN simulations were based on 1098-point datasets obtained via thorough finite element analyzes.</p> <p>The proposed ANN for the prediction of the critical buckling factor in steel grid-shells under uniform loading yields mean and maximum errors of 1.1% and 16.3%, respectively, for all 1098 data points. Only in 10.6% of those examples (points), the prediction error exceeds 3%. </p>


2021 ◽  
Vol 7 (1) ◽  
pp. 1-12
Author(s):  
Borislava Vrigazova

The confirmed approach to choosing the number of principal components for prediction models includes exploring the contribution of each principal component to the total variance of the target variable. A combination of possible important principal components can be chosen to explain a big part of the variance in the target. Sometimes several combinations of principal components should be explored to achieve the highest accuracy in classification. This research proposes a novel automatic way of deciding how many principal components should be retained to improve classification accuracy. We do that by combining principal components with the ANOVA selection. To improve the accuracy resulting from our automatic approach, we use the bootstrap procedure for model selection. We call this procedure the Bootstrapped-ANOVA PCA selection. Our results suggest that this procedure can automate the principal components selection and improve the accuracy of classification models, in our example, the logistic regression. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


2021 ◽  
Author(s):  
Quangang Yang

Background: In mechanical ventilation, there are still some challenges to turn a modern ventilator into a fully reactive device, such as lack of a comprehensive target variable and the unbridged gap between input parameters and output results. This paper aims to present a state ventilation which can provide a measure of two primary, but heterogenous, ventilation support goals. The paper also tries to develop a method to compute, rather than estimate, respiratory parameters to obtain the underlying causal information. Methods: This paper presents a state ventilation, which is calculated based on minute ventilation and blood gas partial pressures, to evaluate the efficacy of ventilation support and indicate disease progression. Through mathematical analysis, formulae are derived to compute dead space volume/ventilation, alveolar ventilation, and CO2 production. Results: Measurements from a reported clinical study are used to verify the analysis and demonstrate the application of derived formulae. The state ventilation gives the expected trend to show patient status, and the calculated mean values of dead space volume, alveolar ventilation, and CO2 production are 158mL, 8.8L/m, and 0.45L/m respectively for a group of patients. Discussions and Conclusions: State ventilation can be used as a target variable since it reflects patient respiratory effort and gas exchange. The derived formulas provide a means to accurately and continuously compute respiratory parameters using routinely available measurements to characterize the impact of different contributing factors.


2021 ◽  
Author(s):  
Umme Marzia Haque

The study has used data from YMM. The Yes/No variables that had a low correlation with target variable have been removed. To extract the most relevant features , the high correlated variables with the target variable , the Boruta method was used in conjunction with a Random Forest( RF) Classifier. To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool (TPOTclassifier) has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) have been employed in the depression identification step.has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) were employed in the depression identification step.


2021 ◽  
Author(s):  
Umme Marzia Haque

The study has used data from YMM. The Yes/No variables that had a low correlation with target variable have been removed. To extract the most relevant features , the high correlated variables with the target variable , the Boruta method was used in conjunction with a Random Forest( RF) Classifier. To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool (TPOTclassifier) has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) have been employed in the depression identification step.has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) were employed in the depression identification step.


Author(s):  
Bohui Xia ◽  
Xueting Wang ◽  
Toshihiko Yamasaki

Given the promising results obtained by deep-learning techniques in multimedia analysis, the explainability of predictions made by networks has become important in practical applications. We present a method to generate semantic and quantitative explanations that are easily interpretable by humans. The previous work to obtain such explanations has focused on the contributions of each feature, taking their sum to be the prediction result for a target variable; the lack of discriminative power due to this simple additive formulation led to low explanatory performance. Our method considers not only individual features but also their interactions, for a more detailed interpretation of the decisions made by networks. The algorithm is based on the factorization machine, a prediction method that calculates factor vectors for each feature. We conducted experiments on multiple datasets with different models to validate our method, achieving higher performance than the previous work. We show that including interactions not only generates explanations but also makes them richer and is able to convey more information. We show examples of produced explanations in a simple visual format and verify that they are easily interpretable and plausible.


2021 ◽  
Author(s):  
Anastasia Dmitrievna Musorina ◽  
Grigory Sergeyevich Ishimbayev

Abstract Under the present conditions of oil and gas production, which are characterized by mature production fields and the focus shifted towards digitalization of production processes and use of machine learning (ML) models, the issues related to the improvement of accuracy and consistency of the well operation control data are becoming increasingly important. As a result, SPD has successfully implemented the project of using annular pressure sensors in combination with machine learning models to control the well annular pressure as part of the field development program compliance. Under the field development program, echosounder and telemetry system readings are typically used to control the annular pressure and the dynamic flowing level. Echosounders, however, are not designed as measuring instruments, the accuracy of their readings being low and making it impossible to reliably evaluate the well's dynamic flowing level and annular pressure, as well as to achieve the well's maximum potential, and the telemetry systems used to measure the pump intake pressure may go wrong. This manuscript describes the approach to the producer well annular pressure assessment based on the machine learning model data. The machine learning (ML) model is a function of the target variable (bottom-hole pressure), which is predicted on the basis of the actual data: static parameters (well schematic, pump design) and dynamic parameters (annular and line pressures, flowrate). The input parameter interpretation results in the most probable value of the target variable based on the historic data.


Sign in / Sign up

Export Citation Format

Share Document