COMPARING ECONOMETRIC ANALYSES WITH MACHINE LEARNING APPROACHES: A STUDY ON SINGAPORE PRIVATE PROPERTY MARKET

2020 ◽  
pp. 1-24
Author(s):  
TINGBIN BIAN ◽  
JIN CHEN ◽  
QU FENG ◽  
JINGYI LI

We aim to compare econometric analyses with machine learning approaches in the context of Singapore private property market using transaction data covering the period of 1995–2018. A hedonic model is employed to quantify the premiums of important attributes and amenities, with a focus on the premium of distance to nearest Mass Rapid Transit (MRT) stations. In the meantime, an investigation using machine learning algorithms under three categories — LASSO, random forest and artificial neural networks is conducted in the same context with deeper insights on importance of determinants of property prices. The results suggest that the MRT distance premium is significant and moving 100[Formula: see text]m closer from the mean distance point to the nearest MRT station would increase the overall transacted price by about 15,000 Singapore dollars (SGD). Machine learning approaches generally achieve higher prediction accuracy and heterogeneous property age premium is suggested by LASSO. Using random forest algorithm, we find that property prices are mostly affected by key macroeconomic factors, such as the time of sale, as well as the size and floor level of property. Finally, an appraisal on different approaches is provided for researchers to utilize additional data sources and data-driven approaches to exploit potential causal effects in economic studies.

2021 ◽  
Author(s):  
Martin Seeliger ◽  
Marina Altmeyer ◽  
Andreas Ginau ◽  
Robert Schiestl ◽  
Jürgen Wunderlich

<p>This paper presents the application of machine-learning techniques on pXRF data to establish a chronology for sediment cores around Tell Buto (Tell el-Fara´in) in the northwestern Nile Delta. As modern laboratories for dating techniques like OSL or <sup>14</sup>C are rare in Egypt and sample export is restricted, we are facing a lack of opportunities to create a robust chronology, which is indispensable in modern Geoarchaeology.</p><p>Therefore, we present a new approach to transfer archaeological age information gained at the excavation at Buto to corings of the wider Buto area. Sediments of archaeological outcrops and pits with known age are measured using pXRF to create a geochemical “fingerprint” for several historic eras. Afterwards, these “fingerprints” are transferred to corings of the surrounding areas using machine-learning algorithms.</p><p>This paper presents 1) the application of three different machine-learning approaches (Neuronal Net, Random Forest, and C5.0 decision tree) to check if archaeological age information can be transferred to sediments far off the settlement mounds using pXRF data, 2) the comparison of all approaches and the evaluation if the easily anticipated decision tree and Random Forest show similar results as the “black-box system” Neuronal Net, and finally, 3) a case study that provides the results of Altmeyer et al. (in review) for Kom el-Gir, a further settlement mound little north of Buto, with a chronostratigraphic framework based on this approach.</p><p>Reference:</p><p>Altmeyer, M., Seeliger, M., Ginau, A., Schiestl, R. & J. Wunderlich (in review):  Reconstruction of former channel systems in the northwestern Nile Delta (Egypt) based on corings and electrical resistivity tomography (ERT). (Submitted to E & G Quaternary Science Journal).</p>


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Tahia Tazin ◽  
Md Nur Alam ◽  
Nahian Nakiba Dola ◽  
Mohammad Sajibul Bari ◽  
Sami Bourouis ◽  
...  

Stroke is a medical disorder in which the blood arteries in the brain are ruptured, causing damage to the brain. When the supply of blood and other nutrients to the brain is interrupted, symptoms might develop. According to the World Health Organization (WHO), stroke is the greatest cause of death and disability globally. Early recognition of the various warning signs of a stroke can help reduce the severity of the stroke. Different machine learning (ML) models have been developed to predict the likelihood of a stroke occurring in the brain. This research uses a range of physiological parameters and machine learning algorithms, such as Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF) Classification, and Voting Classifier, to train four different models for reliable prediction. Random Forest was the best performing algorithm for this task with an accuracy of approximately 96 percent. The dataset used in the development of the method was the open-access Stroke Prediction dataset. The accuracy percentage of the models used in this investigation is significantly higher than that of previous studies, indicating that the models used in this investigation are more reliable. Numerous model comparisons have established their robustness, and the scheme can be deduced from the study analysis.


2020 ◽  
Author(s):  
Albert Morera ◽  
Juan Martínez de Aragón ◽  
José Antonio Bonet ◽  
Jingjing Liang ◽  
Sergio de-Miguel

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.


Recent advancements in remote sensing platforms from satellites to close-range Remotely Piloted Aircraft System (RPAS), is principal to a growing demand for innovative image processing and classification tools. Where, Machine learning approaches are very prevailing group of data driven implication tools that provide a broader scope when applied to remote sensed data. In this paper, applying different machine learning approaches on the remote sensing images with open source packages in R, to find out which algorithm is more efficient for obtaining better accuracy. We carried out a rigorous comparison of four machine learning algorithms-Support vector machine, Random forest, regression tree, Classification and Naive Bayes. These algorithms are evaluated by Classification accurateness, Kappa index and curve area as accuracy metrics. Ten runs are done to obtain the variance in the results on the training set. Using k-fold cross validation the validation is carried out. This theme identifies Random forest approach as the best method based on the accuracy measure under different conditions. Random forest is used to train efficient and highly stable with respect to variations in classification representation parameter values and significantly more accurate than other machine learning approaches trailed


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4605
Author(s):  
Ladislav Polak ◽  
Stanislav Rozum ◽  
Martin Slanina ◽  
Tomas Bravenec ◽  
Tomas Fryza ◽  
...  

The fingerprinting technique is a popular approach to reveal location of persons, instruments or devices in an indoor environment. Typically based on signal strength measurement, a power level map is created first in the learning phase to align with measured values in the inference. Second, the location is determined by taking the point for which the recorded received power level is closest to the power level actually measured. The biggest limit of this technique is the reliability of power measurements, which may lack accuracy in many wireless systems. To this end, this work extends the power level measurement by using multiple anchors and multiple radio channels and, consequently, considers different approaches to aligning the actual measurements with the recorded values. The dataset is available online. This article focuses on the very popular radio technology Bluetooth Low Energy to explore the possible improvement of the system accuracy through different machine learning approaches. It shows how the accuracy–complexity trade-off influences the possible candidate algorithms on an example of three-channel Bluetooth received signal strength based fingerprinting in a one dimensional environment with four static anchors and in a two dimensional environment with the same set of anchors. We provide a literature survey to identify the machine learning algorithms applied in the literature to show that the studies available can not be compared directly. Then, we implement and analyze the performance of four most popular supervised learning techniques, namely k Nearest Neighbors, Support Vector Machines, Random Forest, and Artificial Neural Network. In our scenario, the most promising machine learning technique being the Random Forest with classification accuracy over 99%.


Author(s):  
Josh Roll

Monitoring nonmotorized traffic is becoming increasingly common practice at local and state departments of transportation. These travel activity data are necessary to monitor the system and track progress toward active transportation policy and program goals. A common problem is that permanent count site data are often missing, making those sites less useful. Being able to accurately estimate those missing data records functionally increases the amount of data available to use by themselves as metrics for monitoring traffic but also makes available more data for factoring short-term sites. Using nonmotorized traffic counts from several cities in Oregon, this research compared the ability of day-of-year (DOY) factors, a statistical model, and machine learning algorithms to accurately impute daily traffic records for annual traffic estimation. Based on exhaustive cross-validation experiments using data not missing at random scenarios, this research concluded that random forest and DOY factor approaches could be used to impute daily counts for nonmotorized traffic but each approach comes with tradeoffs. Though for many missing data scenarios random forest performed best, this method is complicated to estimate and apply. DOY factor-based methods are simpler to create and apply, and though more accurate in scenarios with significant amounts of missing data, they were less flexible given the need for data from neighboring count sites. Negative binomial regression was also found to work well in scenarios with moderate to low amounts of missing data. This work can inform nonmotorized traffic count programs needing vetted solutions for traffic data imputation.


2020 ◽  
Vol 25 (40) ◽  
pp. 4296-4302 ◽  
Author(s):  
Yuan Zhang ◽  
Zhenyan Han ◽  
Qian Gao ◽  
Xiaoyi Bai ◽  
Chi Zhang ◽  
...  

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2020 ◽  
Vol 13 (1) ◽  
pp. 10
Author(s):  
Andrea Sulova ◽  
Jamal Jokar Arsanjani

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.


Sign in / Sign up

Export Citation Format

Share Document