Modelling of a Stochastic Spatiotemporal Variable in Transport Domain

In this article, building on our previous work, we engage in spatiotemporal modelling of transport demand in the Montreal metropolitan area over the period of six years. We employ classical machine learning and regression models, which predict bike-sharing demand in the form of daily cumulative sums of bike trips for each considered docking station. Hourly estimates of demand are then determined by considering the statistical distribution of demand across individual hours of an average day. In order to capture seasonal and other regular variation of demand, longer-term distribution characteristics of bike trips, such as their average number falling on each day of the week, month of the year, etc., were also used as input attributes. We initially conjectured that weather would be an important source of irregular variation in bike-sharing demand, and subsequently included several available meteorological variables in our models. We validated our models by Hold-Out and 10-Fold Cross-Validation, with encouraging results.

Download Full-text

Global observation-based climatology of precipitation occurrence and peak intensity

10.5194/egusphere-egu2020-7837 ◽

2020 ◽

Author(s):

Hylke Beck ◽

Seth Westra ◽

Eric Wood

Keyword(s):

Land Surface ◽

Regression Models ◽

Cross Validation ◽

Climate Models ◽

Daily Precipitation ◽

State Of The Art ◽

Coefficient Of Determination ◽

Peak Intensity ◽

Uncertainty Estimates ◽

Fold Cross Validation

We introduce a unique set of global observation-based climatologies of daily precipitation (P) occurrence (related to the lower tail of the P distribution) and peak intensity (related to the upper tail of the P distribution). The climatologies were produced using Random Forest (RF) regression models trained with an unprecedented collection of daily P observations from 93,138 stations worldwide. Five-fold cross-validation was used to evaluate the generalizability of the approach and to quantify uncertainty globally. The RF models were found to provide highly satisfactory performance, yielding cross-validation coefficient of determination (R2) values from 0.74 for the 15-year return-period daily P intensity to 0.86 for the >0.5 mm d-1 daily P occurrence. The performance of the RF models was consistently superior to that of state-of-the-art reanalysis (ERA5) and satellite (IMERG) products. The highest P intensities over land were found along the western equatorial coast of Africa, in India, and along coastal areas of Southeast Asia. Using a 0.5 mm d-1 threshold, P was estimated to occur 23.2 % of days on average over the global land surface (excluding Antarctica). The climatologies including uncertainty estimates will be released as the Precipitation DISTribution (PDIST) dataset via www.gloh2o.org/pdist. We expect the dataset to be useful for numerous purposes, such as the evaluation of climate models, the bias correction of gridded P datasets, and the design of hydraulic structures in poorly gauged regions.

Download Full-text

Estimating Daily PM2.5 Concentrations in Beijing Using 750-M VIIRS IP AOD Retrievals and a Nested Spatiotemporal Statistical Model

Remote Sensing ◽

10.3390/rs11070841 ◽

2019 ◽

Vol 11 (7) ◽

pp. 841 ◽

Cited By ~ 7

Author(s):

Fei Yao ◽

Jiansheng Wu ◽

Weifeng Li ◽

Jian Peng

Keyword(s):

Statistical Model ◽

Fixed Effects ◽

Regression Models ◽

Goodness Of Fit ◽

Cross Validation ◽

Infrared Imaging ◽

Second Stage ◽

Independent Variable ◽

Fixed Effects Regression ◽

Fold Cross Validation

Satellite-retrieved aerosol optical depth (AOD) data have been widely used to predict PM2.5 concentrations. Most of their spatial resolutions (~1 km or greater), however, are too coarse to support PM2.5-related studies at fine scales (e.g., urban-scale PM2.5 exposure assessments). Space-time regression models have been widely developed and applied to predict PM2.5 concentrations from satellite-retrieved AOD. Their accuracies, however, are not satisfactory particularly on days that lack a model dataset. The present study aimed to evaluate the effectiveness of recent high-resolution (i.e., ~750 m at nadir) AOD obtained from the Visible Infrared Imaging Radiometer Suite instrument (VIIRS) Intermediate Product (IP) in estimating PM2.5 concentrations with a newly developed nested spatiotemporal statistical model. The nested spatiotemporal statistical model consisted of two parts: a nested time fixed effects regression (TFER) model and a series of geographically weighted regression (GWR) models. The TFER model, containing daily, weekly, or monthly intercepts, used the VIIRS IP AOD as the main predictor alongside several auxiliary variables to predict daily PM2.5 concentrations. Meanwhile, the series of GWR models used the VIIRS IP AOD as the independent variable to correct residuals from the first-stage nested TFER model. The average spatiotemporal coverage of the VIIRS IP AOD was approximately 16.12%. The sample-based ten-fold cross validation goodness of fit (R2) for the first-stage TFER models with daily, weekly, and monthly intercepts were 0.81, 0.66, and 0.45, respectively. The second-stage GWR models further captured the spatial heterogeneities of the PM2.5-AOD relationships. The nested spatiotemporal statistical model produced more daily PM2.5 estimates and improved the accuracies of summer, autumn, and annual PM2.5 estimates. This study contributes to the knowledge of how well VIIRS IP AOD can predict PM2.5 concentrations at urban scales and offers strategies for improving the coverage and accuracy of daily PM2.5 estimates on days that lack a model dataset.

Download Full-text

Analysis and Prediction of Instagram Users Popularity using Regression Techniques based on Metadata, Media and Hashtags Analysis

10.31219/osf.io/uezyk ◽

2020 ◽

Author(s):

Kristo Radion Purba ◽

David Asirvatham ◽

Raja Kumar Murugesan

Keyword(s):

Machine Learning ◽

Social Media ◽

Statistical Analysis ◽

Random Forest ◽

Regression Models ◽

Cross Validation ◽

The Past ◽

Proposed Model ◽

Regression Techniques ◽

Fold Cross Validation

In recent years, social media is growing at an unprecedented rate, and more people have become influencers. Understanding popularity helps ordinary users to boost popularity, and business users to choose better influencers. There were studies to predict the popularity of posted images on social media, but there was none on the user's popularity as a whole. Furthermore, existing studies have not taken hashtag analysis into consideration, one of the most useful social media feature. This research aims to create a model to predict a user's popularity, which is defined by a combination of engagement rate and followers growth. There were six machine learning regression models tested. The proposed model successfully predicted the users’ popularity, with R2 up to 0.852, using Random Forest with 10-fold cross-validation. The additional statistical analysis and features analysis results revealed factors that can boost popularity, such as actively posting and following users, completing user's metadata, and using 11 hashtags. In contrast, it was also found that having a large number of posts and following in the past will not help in growing popularity, as well as the use of popular hashtags.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Environmental Sensitivity and Awareness as Differentiating Factors in the Purchase Decision-Making Process in the Smartphone Industry—Case of Polish Consumers

Sustainability ◽

10.3390/su13010348 ◽

2021 ◽

Vol 13 (1) ◽

pp. 348

Author(s):

Lukasz Skowron ◽

Monika Sak-Skowron

Keyword(s):

Expectation Maximization ◽

Cross Validation ◽

Expectation Maximization Algorithm ◽

Decision Making Process ◽

Environmental Sensitivity ◽

Significance Level ◽

On Line ◽

Purchase Process ◽

The Impact ◽

Fold Cross Validation

The first of the research objectives discussed in this article was to analyze the differences related to the valuation of particular factors influencing the purchase process in the smartphone industry, expressed by respondents with different sensitivity and environmental awareness, as well as the assessment of their knowledge about the impact of smartphones on the natural environment. The second objective of the research was to determine whether the level of environmental sensitivity, awareness and knowledge about the impact of smartphones on the environment has a statistically significant influence on the respondents’ choice of smartphone brand. The survey was conducted using an on-line questionnaire, distributed by a specialized research agency on a representative sample of over 1000 Polish residents. In order to identify the various customers clusters, the expectation-maximization algorithm and the v-fold cross-validation were used. Additionally, in order to analyze the significance level of differences between clusters the nonparametric Mann-Whitney U-test was carried out. The results show unequivocally that people with a different approach to ecological issues demonstrate statistically significant differences in their purchasing behaviors in the smartphone industry. Furthermore, it was noticed that in the case of comparing some smartphones brands, there is a statistically confirmed difference in the environmental sensitivity and awareness of the customers who use them. Moreover, the research has shown that in Polish customers’ consciousness smartphones are mistakenly considered to be relatively safe and environmentally friendly products.

Download Full-text

Interclass Interference Suppression in Multi-Class Problems

Applied Sciences ◽

10.3390/app11010450 ◽

2021 ◽

Vol 11 (1) ◽

pp. 450

Author(s):

Jinfu Liu ◽

Mingliang Bai ◽

Na Jiang ◽

Ran Cheng ◽

Xianling Li ◽

...

Keyword(s):

Classification Accuracy ◽

Cross Validation ◽

Selection Process ◽

Interference Suppression ◽

Generalization Ability ◽

Suppression Effect ◽

Binary Classifiers ◽

The One ◽

Fold Cross Validation ◽

Validation Experiments

Multi-classifiers are widely applied in many practical problems. But the features that can significantly discriminate a certain class from others are often deleted in the feature selection process of multi-classifiers, which seriously decreases the generalization ability. This paper refers to this phenomenon as interclass interference in multi-class problems and analyzes its reason in detail. Then, this paper summarizes three interclass interference suppression methods including the method based on all-features, one-class classifiers and binary classifiers and compares their effects on interclass interference via the 10-fold cross-validation experiments in 14 UCI datasets. Experiments show that the method based on binary classifiers can suppress the interclass interference efficiently and obtain the best classification accuracy among the three methods. Further experiments were done to compare the suppression effect of two methods based on binary classifiers including the one-versus-one method and one-versus-all method. Results show that the one-versus-one method can obtain a better suppression effect on interclass interference and obtain better classification accuracy. By proposing the concept of interclass inference and studying its suppression methods, this paper significantly improves the generalization ability of multi-classifiers.

Download Full-text

Convolutional Neural Network-Based Clinical Predictors of Oral Dysplasia: Class Activation Map Analysis of Deep Learning Results

Cancers ◽

10.3390/cancers13061291 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1291

Author(s):

Seda Camalan ◽

Hanya Mahmood ◽

Hamidullah Binol ◽

Anna Luiza Damaceno Araújo ◽

Alan Roger Santos-Silva ◽

...

Keyword(s):

Cross Validation ◽

Clinical Predictors ◽

Oral Dysplasia ◽

Novel Technologies ◽

Photographic Images ◽

The Uk ◽

Map Analysis ◽

Development And Validation ◽

Fold Cross Validation ◽

Activation Map

Oral cancer/oral squamous cell carcinoma is among the top ten most common cancers globally, with over 500,000 new cases and 350,000 associated deaths every year worldwide. There is a critical need for objective, novel technologies that facilitate early, accurate diagnosis. For this purpose, we have developed a method to classify images as “suspicious” and “normal” by performing transfer learning on Inception-ResNet-V2 and generated automated heat maps to highlight the region of the images most likely to be involved in decision making. We have tested the developed method’s feasibility on two independent datasets of clinical photographic images of 30 and 24 patients from the UK and Brazil, respectively. Both 10-fold cross-validation and leave-one-patient-out validation methods were performed to test the system, achieving accuracies of 73.6% (±19%) and 90.9% (±12%), F1-scores of 97.9% and 87.2%, and precision values of 95.4% and 99.3% at recall values of 100.0% and 81.1% on these two respective cohorts. This study presents several novel findings and approaches, namely the development and validation of our methods on two datasets collected in different countries showing that using patches instead of the whole lesion image leads to better performance and analyzing which regions of the images are predictive of the classes using class activation map analysis.

Download Full-text