Practical benchmarking of statistical and machine learning models for predicting the condition of sewer pipes in Berlin, Germany

Abstract Deterioration models can be successfully deployed only if decision-makers trust the modelling outcomes and are aware of model uncertainties. Our study aims to address this issue by developing a set of clearly understandable metrics to assess the performance of sewer deterioration models from an end-user perspective. The developed metrics are used to benchmark the performance of a statistical model, namely, GompitZ based on survival analysis and Markov-chains, and a machine learning model, namely, Random Forest, an ensemble learning method based on decision trees. The models have been trained with the extensive CCTV dataset of the sewer network of Berlin, Germany (115,258 inspections). At network level, both models give satisfactory outcomes with deviations between predicted and inspected condition distributions below 5%. At pipe level, the statistical model does not perform better than a simple random model, which attributes randomly a condition class to each inspected pipe, whereas the machine learning model provides satisfying performance. 66.7% of the pipes inspected in bad condition have been predicted correctly. The machine learning approach shows a strong potential for supporting operators in the identification of pipes in critical condition for inspection programs whereas the statistical approach is more adapted to support strategic rehabilitation planning.

Download Full-text

Quantitative Toxicity Prediction via Ensembling of Heterogeneous Predictors

10.21203/rs.2.19338/v1 ◽

2019 ◽

Author(s):

Abdul Karim ◽

Vahid Riahi ◽

Avinash Mishra ◽

Abdollah Dehzangi ◽

M. A. Hakim Newton ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Models ◽

Individual Performance ◽

Learning Model ◽

Data Representation ◽

Toxicity Prediction ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Benchmark Datasets

Abstract Representing molecules in the form of only one type of features and using those features to predict their activities is one of the most important approaches for machine-learning-based chemical-activity-prediction. For molecular activities like quantitative toxicity prediction, the performance depends on the type of features extracted and the machine learning approach used. For such cases, using one type of features and machine learning model restricts the prediction performance to specific representation and model used. In this paper, we study quantitative toxicity prediction and propose a machine learning model for the same. Our model uses an ensemble of heterogeneous predictors instead of typically using homogeneous predictors. The predictors that we use vary either on the type of features used or on the deep learning architecture employed. Each of these predictors presumably has its own strengths and weaknesses in terms of toxicity prediction. Our motivation is to make a combined model that utilizes different types of features and architectures to obtain better collective performance that could go beyond the performance of each individual predictor. We use six predictors in our model and test the model on four standard quantitative toxicity benchmark datasets. Experimental results show that our model outperforms the state-of-the-art toxicity prediction models in 8 out of 12 accuracy measures. Our experiments show that ensembling heterogeneous predictor improves the performance over single predictors and homogeneous ensembling of single predictors.The results show that each data representation or deep learning based predictor has its own strengths and weaknesses, thus employing a model ensembling multiple heterogeneous predictors could go beyond individual performance of each data representation or each predictor type.

Download Full-text

Can machine learning model with static features be fooled: an adversarial machine learning approach

Cluster Computing ◽

10.1007/s10586-020-03083-5 ◽

2020 ◽

Vol 23 (4) ◽

pp. 3233-3253 ◽

Cited By ~ 4

Author(s):

Rahim Taheri ◽

Reza Javidan ◽

Mohammad Shojafar ◽

P. Vinod ◽

Mauro Conti

Keyword(s):

Machine Learning ◽

Learning Model ◽

Learning Approach ◽

Machine Learning Model ◽

Machine Learning Approach

Download Full-text

A Machine Learning Approach for Predicting Student Performance

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952106 ◽

2019 ◽

pp. 462-465

Author(s):

C. Selvi ◽

R. Shalini ◽

V. Navaneethan ◽

L. Santhiya

Keyword(s):

Machine Learning ◽

Academic Performance ◽

Student Performance ◽

Learning Model ◽

Learning Approach ◽

Economic Prosperity ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Novel Method ◽

Predicting Student Performance

An University’s reputation and its standard are weighted by its students performance and their part in the future economic prosperity of the nation, hence a novel method of predicting the student’s upcoming academic performance is really essential to provide a pre-requisite information upon their performances. A machine learning model can be developed to predict the student’s upcoming scores or their entire performance depending upon their previous academic performances.

Download Full-text

Café and Restaurant under My Home: Predicting Urban Commercialization through Machine Learning

Sustainability ◽

10.3390/su13105699 ◽

2021 ◽

Vol 13 (10) ◽

pp. 5699

Author(s):

Seung-Chul Noh ◽

Jung-Ho Park

Keyword(s):

Machine Learning ◽

Warning System ◽

Large Degree ◽

Learning Model ◽

Urban Housing ◽

Residential Areas ◽

Urban Context ◽

Machine Learning Model ◽

Machine Learning Approach

The small commercial stores opening in housing structures in Seoul have been soaring since the beginning of this century. While commercialization generally increases urban vitality and achieves land use mix, cafés and restaurants in low-rise residential areas may attract numerous passenger populations, with increased noise and crimes, in the residential area. The urban commercialization is so fast and prevalent that neither urban researchers nor policymakers can respond to it timely without a practical prediction tool. Focusing on cafés and restaurants, we propose an XGBoost machine learning model that can predict commercial store openings in urban residential areas and further play the role of an early warning system. Our findings highlight a large degree of difference in the predictor importance between the variables used in our machine learning model. The most important predictor relates to land price, indicating that economic motivation leads to the conversion of urban housing to small cafés and restaurants. The Mapo neighborhood is predicted to be the most prone to the commercialization of urban housing, therefore, its urgency to be prepared against expected commercialization deserves underscoring. Overall, our results show that the machine learning approach can be applied to predict changes in land uses and contribute to timely policy designs in rapidly changing urban context.

Download Full-text

Pointer-Based Item-to-Item Collaborative Filtering Recommendation System Using a Machine Learning Model

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622021500619 ◽

2021 ◽

pp. 1-22

Author(s):

Celestine Iwendi ◽

Ebuka Ibeke ◽

Harshini Eggoni ◽

Sreerajavenkatareddy Velagala ◽

Gautam Srivastava

Keyword(s):

Machine Learning ◽

Collaborative Filtering ◽

Recommendation System ◽

Absolute Error ◽

Learning Model ◽

Text Recall ◽

Machine Learning Model ◽

Similarity Algorithm ◽

Recommendation Accuracy ◽

Better Than

The creation of digital marketing has enabled companies to adopt personalized item recommendations for their customers. This process keeps them ahead of the competition. One of the techniques used in item recommendation is known as item-based recommendation system or item–item collaborative filtering. Presently, item recommendation is based completely on ratings like 1–5, which is not included in the comment section. In this context, users or customers express their feelings and thoughts about products or services. This paper proposes a machine learning model system where 0, 2, 4 are used to rate products. 0 is negative, 2 is neutral, 4 is positive. This will be in addition to the existing review system that takes care of the users’ reviews and comments, without disrupting it. We have implemented this model by using Keras, Pandas and Sci-kit Learning libraries to run the internal work. The proposed approach improved prediction with [Formula: see text] accuracy for Yelp datasets of businesses across 11 metropolitan areas in four countries, along with a mean absolute error (MAE) of [Formula: see text], precision at [Formula: see text], recall at [Formula: see text] and F1-Score at [Formula: see text]. Our model shows scalability advantage and how organizations can revolutionize their recommender systems to attract possible customers and increase patronage. Also, the proposed similarity algorithm was compared to conventional algorithms to estimate its performance and accuracy in terms of its root mean square error (RMSE), precision and recall. Results of this experiment indicate that the similarity recommendation algorithm performs better than the conventional algorithm and enhances recommendation accuracy.

Download Full-text

Building ML Based Intelligent System to Analyze Production LSI (Live Site Incidents)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c2178.0210321 ◽

2021 ◽

Vol 10 (3) ◽

pp. 41-46

Author(s):

Himanshu Bajpai

Keyword(s):

Machine Learning ◽

Customer Satisfaction ◽

Intelligent System ◽

Learning Model ◽

Classification Model ◽

Classification Algorithms ◽

End User ◽

Machine Learning Model ◽

Basic Understanding ◽

Major Factors

Providing support on the rolled-out application/services is one of the major factors in increasing the customer satisfaction which in turn increases the customer retention. Since we are in the era of automation where most of the day-to-day jobs are taken care of or are facilitated by the technologies around us, hence there is a need to reduce manual effort in triaging the support tickets and hence facilitating the person on call to better close the tickets on time with proper remediation. The machine learning model which will be the product of this complete paper will not only help in classifying the tickets but also, if applicable will give the best possible remediation of the ticket there by reducing the manual effort and the time taken on providing necessary solution on the ticket. The objectives of the work are as follows - a) Understand the data that is present in the ticket and figure out the basic understanding like, categories of issues, trends etc. b) Prepare the data which is ready for applying different classification algorithms. d) Identify the best machine learning model which can classify the new incident with utmost accuracy. e) Prepare a machine learning model which can suggest the best possible remediation of the ticket. f) Integrate the best classification model and solution recommender model and wrap it as an API which can be used by end user.

Download Full-text

LOOK WHO’S TALKING: USING HUMAN CODING TO ESTABLISH A MACHINE LEARNING APPROACH TO TWITTER EDUCATION CHATS

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2018i0.10512 ◽

2020 ◽

Author(s):

K. Bret Staudt Willet ◽

Brooks D. Willet

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Learning Model ◽

Word Count ◽

Mutual Engagement ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Different Types ◽

Teacher Professional ◽

Logistic Regression Classifier

Twitter has become a hub for many different types of educational conversations, denoted by hashtags and organized by a variety of affinities. Researchers have described these educational conversations on Twitter as sites for teacher professional development. Here, we studied #Edchat—one of the oldest and busiest Twitter educational hashtags—to examine the content of contributions for evidence of professional purposes. We collected tweets containing the text “#edchat” from October 1, 2017 to June 5, 2018, resulting in a dataset of 1,228,506 unique tweets from 196,263 different contributors. Through initial human-coded content analysis, we sorted a stratified random sample of 1,000 tweets into four inductive categories: tweets demonstrating evidence of different professional purposes related to (a) self, (b) others, (c) mutual engagement, and (d) everything else. We found 65% of the tweets in our #Edchat sample demonstrated purposes related to others, 25% demonstrated purposes related to self, and 4% of tweets demonstrated purposes related to mutual engagement. Our initial method was too time intensive—it would be untenable to collect tweets from 339 known Twitter education hashtags and conduct human-coded content analysis of each. Therefore, we are developing a scalable machine-learning model—a multiclass logistic regression classifier using an input matrix of features such as tweet types, keywords, sentiment, word count, hashtags, hyperlinks, and tweet metadata. The anticipated product of this research—a successful, generalizable machine learning model—would help educators and researchers quickly evaluate Twitter educational hashtags to determine where they might want to engage.

Download Full-text

Radiomics and Machine Learning Differentiate Soft-Tissue Lipoma and Liposarcoma Better than Musculoskeletal Radiologists

Sarcoma ◽

10.1155/2020/7163453 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Ieva Malinauskaite ◽

Jeremy Hofmeister ◽

Simon Burgermeister ◽

Angeliki Neroladaki ◽

Marion Hamard ◽

...

Keyword(s):

Machine Learning ◽

Soft Tissue ◽

Histopathological Examination ◽

Characteristic Curve ◽

Learning Model ◽

Histopathological Diagnosis ◽

Test Machine ◽

Learning Classifier ◽

Machine Learning Model ◽

Better Than

Distinguishing lipoma from liposarcoma is challenging on conventional MRI examination. In case of uncertain diagnosis following MRI, further invasive procedure (percutaneous biopsy or surgery) is often required to allow for diagnosis based on histopathological examination. Radiomics and machine learning allow for several types of pathologies encountered on radiological images to be automatically and reliably distinguished. The aim of the study was to assess the contribution of radiomics and machine learning in the differentiation between soft-tissue lipoma and liposarcoma on preoperative MRI and to assess the diagnostic accuracy of a machine-learning model compared to musculoskeletal radiologists. 86 radiomics features were retrospectively extracted from volume-of-interest on T1-weighted spin-echo 1.5 and 3.0 Tesla MRI of 38 soft-tissue tumors (24 lipomas and 14 liposarcomas, based on histopathological diagnosis). These radiomics features were then used to train a machine-learning classifier to distinguish lipoma and liposarcoma. The generalization performance of the machine-learning model was assessed using Monte-Carlo cross-validation and receiver operating characteristic curve analysis (ROC-AUC). Finally, the performance of the machine-learning model was compared to the accuracy of three specialized musculoskeletal radiologists using the McNemar test. Machine-learning classifier accurately distinguished lipoma and liposarcoma, with a ROC-AUC of 0.926. Notably, it performed better than the three specialized musculoskeletal radiologists reviewing the same patients, who achieved ROC-AUC of 0.685, 0.805, and 0.785. Despite being developed on few cases, the trained machine-learning classifier accurately distinguishes lipoma and liposarcoma on preoperative MRI, with better performance than specialized musculoskeletal radiologists.

Download Full-text

Development of machine learning model for predicting hospitalization in the prehospital setting

10.1101/2021.11.29.21266929 ◽

2021 ◽

Author(s):

Kenichiro Morisawa ◽

Tadahiro Goto ◽

Shigeki Fujitani

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Patient Outcomes ◽

Vital Signs ◽

Learning Model ◽

Prehospital Setting ◽

Early Warning Score ◽

Prediction Ability ◽

Machine Learning Model ◽

Better Than

Background: Studies have developed models for predicting patient outcomes for successful risk stratification in the prehospital setting. However, these models generally require many predictors to achieve high prediction ability, resulting in a bar for implementing models in the real clinical setting. Objective: We aimed to develop a simple and implementable machine learning model using automatically-collected data (age, sex, vital signs) to predict patient outcomes during transportation in comparison with National Early Warning Score (NEWS). Methods: This is a retrospective cohort study using data from the ED of three tertiary care hospitals in Japan from April 2017 to March 2020. We included adult patients (aged over 18 years) who were transported to the ED of participating hospitals. We excluded patients with trauma/injury, cardiac arrest, transferred from other hospitals, patients with missing vital signs data, or having data of obvious outliers. The predictors were patient age, sex, mental status evaluated with Japan Coma Scale, systolic blood pressure, diastolic blood pressure, pulse rate, respiratory rate, and oxygen saturation. The primary outcome was hospitalization. We developed a model using XGBoost. Results: During the study period, 3528 visits transported by emergency medical services were eligible. The median NEWS was 4.0, and 2081 patients were hospitalized. The discrimination ability of the newly developed model was 0.70 (95%CI 0.67-0.73), which was better than those of NEWS 0.64 (95%CI 0.61-0.68). The newly developed models performance measures (e.g., sensitivity, specificity) were comparable with NEWS. Conclusions: Our newly developed machine learning model using routinely available data has moderate prediction ability and was better than NEWS.

Download Full-text