A Machine Learning Framework for Predicting Bridge Defect Detection Cost

Social network is a hot topic of interest for researchers in the field of computer science in recent years. These social networks such as Facebook, Twitter, Instagram play an important role in information diffusion. Social network data are created by its users. Users’ online activities and behavior have been studied in various past research efforts in order to get a better understanding on how information is diffused on social networks. In this study, we focus on Twitter and we explore the impact of user behavior on their retweet activity. To represent a user’s behavior for predicting their retweet decision, we introduce 10-dimentional emotion and 35-dimensional personality related features. We consider the difference of a user being an author and a retweeter in terms of their behaviors, and propose a machine learning based retweet prediction model considering this difference. We also propose two approaches for matrix factorization retweet prediction model which learns the latent relation between users and tweets to predict the user’s retweet decision. In the experiment, we have tested our proposed models. We find that models based on user behavior related features provide good improvement (3% - 6% in terms of F1- score) over baseline models. By only considering user’s behavior as a retweeter, the data processing time is reduced while the prediction accuracy is comparable to the case when both retweeting and posting behaviors are considered. In the proposed matrix factorization models, we include tweet features into the basic factorization model through newly defined regularization terms and improve the performance by 3% - 4% in terms of F1-score. Finally, we compare the performance of machine learning and matrix factorization models for retweet prediction and find that none of the models is superior to the other in all occasions. Therefore, different models should be used depending on how prediction results will be used. Machine learning model is preferable when a model’s performance quality is important such as for tweet re-ranking and tweet recommendation. Matrix factorization is a preferred option when model’s positive retweet prediction capability is more important such as for marketing campaign and finding potential retweeters.

Download Full-text

Retweet Prediction Based on User Behavior

10.32920/ryerson.14657001.v1 ◽

2021 ◽

Author(s):

Syeda Nadia Firdaus

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Network ◽

Prediction Model ◽

Matrix Factorization ◽

Information Diffusion ◽

User Behavior ◽

Past Research ◽

The Difference ◽

The Impact

Social network is a hot topic of interest for researchers in the field of computer science in recent years. These social networks such as Facebook, Twitter, Instagram play an important role in information diffusion. Social network data are created by its users. Users’ online activities and behavior have been studied in various past research efforts in order to get a better understanding on how information is diffused on social networks. In this study, we focus on Twitter and we explore the impact of user behavior on their retweet activity. To represent a user’s behavior for predicting their retweet decision, we introduce 10-dimentional emotion and 35-dimensional personality related features. We consider the difference of a user being an author and a retweeter in terms of their behaviors, and propose a machine learning based retweet prediction model considering this difference. We also propose two approaches for matrix factorization retweet prediction model which learns the latent relation between users and tweets to predict the user’s retweet decision. In the experiment, we have tested our proposed models. We find that models based on user behavior related features provide good improvement (3% - 6% in terms of F1- score) over baseline models. By only considering user’s behavior as a retweeter, the data processing time is reduced while the prediction accuracy is comparable to the case when both retweeting and posting behaviors are considered. In the proposed matrix factorization models, we include tweet features into the basic factorization model through newly defined regularization terms and improve the performance by 3% - 4% in terms of F1-score. Finally, we compare the performance of machine learning and matrix factorization models for retweet prediction and find that none of the models is superior to the other in all occasions. Therefore, different models should be used depending on how prediction results will be used. Machine learning model is preferable when a model’s performance quality is important such as for tweet re-ranking and tweet recommendation. Matrix factorization is a preferred option when model’s positive retweet prediction capability is more important such as for marketing campaign and finding potential retweeters.

Download Full-text

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network

Scientific Reports ◽

10.1038/s41598-019-53034-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 5

Author(s):

Kanggeun Lee ◽

Hyoung-oh Jeong ◽

Semin Lee ◽

Won-Ki Jeong

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Genomic Data ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Somatic Alterations ◽

The Impact ◽

Type Classification

AbstractWith recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.

Download Full-text

A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19

Big Data and Cognitive Computing ◽

10.3390/bdcc4040033 ◽

2020 ◽

Vol 4 (4) ◽

pp. 33

Author(s):

Toni Pano ◽

Rasha Kashef

Keyword(s):

Machine Learning ◽

Social Media ◽

Prediction Model ◽

Sentiment Analysis ◽

Significant Role ◽

Prediction Models ◽

Financial Sector ◽

Research Gap ◽

Text Preprocessing ◽

The Impact

During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.

Download Full-text

Corrigendum to “The impact coenzyme Q10 supplementation on the inflammatory indices of women with breast cancer using A machine learning prediction model”. Informatics in Medicine Unlocked Volume 24, 2021, 100614

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100650 ◽

2021 ◽

pp. 100650

Author(s):

Amir Jamshidnezhad ◽

Zohreh Anjomshoa ◽

Sayed Ahmad Hosseini ◽

Ahmad Azizi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Prediction Model ◽

Coenzyme Q10 ◽

The Impact ◽

Coenzyme Q10 Supplementation

Download Full-text

Applying Bayesian method to investigate determinants of non performing loans of banks in Vietnam

Science & Technology Development Journal - Economics - Law and Management ◽

10.32508/stdjelm.v5i1.704 ◽

2021 ◽

Vol 5 (1) ◽

pp. first

Author(s):

Nam Hai Pham ◽

Nguyen Ngoc Tan

Keyword(s):

Commercial Banks ◽

Policy Implications ◽

Bank Loan ◽

Gdp Growth ◽

Bank Capital ◽

Factors Affecting ◽

Gdp Growth Rate ◽

Macro Variables ◽

Mcmc Chain ◽

The Impact

This study was conducted to determine the factors affecting non-performing loans of commercial banks in Vietnam for the period 2007 - 2018. The study applies the Bayesian approach and the Random-walk Metropolis-Hastings algorithm to evaluate the impact of micro and macro factors on non-performing loans of commercial banks. The dependent variable is non-performing loans, which is measured by the ratio of non-performing loans divided by total outstanding loans; the independent variables in terms of bank characteristics are non-performing loans of the previous year, profitability, bank size, bak loans, and bank capital; the macro variables are inflation and GDP growth. Research data was collected from financial statements of 30 Vietnamese commercial banks and the General Statistics Office of Vietnam from 2007 to 2018. To increase the reliability and efficiency of the model as well as reasonable Bayes inference, a convergence test of the MCMC chain was performed. The result of this study shows that non-performing loans of the previous year, bank size, bank loan, bank capital, and inflation have positive impacts on bank non-performing loans. In addition, bank profitability and GDP growth rate are factors that have the opposite effects. Based on the research results, the author proposes policy implications for the decision-makers to help banks reduce non-performing loans and promote banks to operate effectively and more efficiently.

Download Full-text

The impact coenzyme Q10 supplementation on the inflammatory indices of women with breast cancer using A machine learning prediction model

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100614 ◽

2021 ◽

pp. 100614

Author(s):

Amir Jamshidnezhad ◽

Zohreh Anjomshoa ◽

Sayed Ahmad Hosseini ◽

Ahmad Azizi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Prediction Model ◽

Coenzyme Q10 ◽

The Impact ◽

Coenzyme Q10 Supplementation

Download Full-text

Analysis and Prediction Model of Resident Travel Satisfaction

Sustainability ◽

10.3390/su12187522 ◽

2020 ◽

Vol 12 (18) ◽

pp. 7522

Author(s):

Zhenzhen Xu ◽

Chunfu Shao ◽

Shengyou Wang ◽

Chunjiao Dong

Keyword(s):

Prediction Model ◽

Evaluation Model ◽

Model Fitting ◽

Urban Traffic ◽

Support Vector ◽

Safety Hazards ◽

Factors Affecting ◽

The Impact ◽

Travel Satisfaction ◽

Significant Factors

To promote the sustainable development of urban traffic and improve resident travel satisfaction, the significant factors affecting resident travel satisfaction are analyzed in this paper. An evaluation and prediction model for travel satisfaction based on support vector machine (SVM) is constructed. First, a multinomial logit (MNL) model is constructed to reveal the impact of individual attributes, family attributes and safety hazards on resident travel satisfaction and to clarify the significant factors. Then, a travel satisfaction evaluation model based on the SVM is constructed by taking significant factors as independent variables. Finally, travel optimization measures are proposed and the SVM model is used to predict the effect. Futian Street in Futian District of Shenzhen is taken as the object to carry out specific research. The results show that the following factors have a significant effect on resident travel satisfaction: age, job, level of education, number of car, income, residential area and potential safety hazards of people, vehicles, roads, environment, etc. The model fitting accuracy is 87.76%. The implementation of travel optimization measures may increase travel satisfaction rate by 14.07%.

Download Full-text

Factors Affecting A Municipalitys Bond Rating: An Empirical Study

Journal of Business & Economics Research (JBER) ◽

10.19030/jber.v4i11.2712 ◽

2011 ◽

Vol 4 (11) ◽

Cited By ~ 1

Author(s):

George Palumbo ◽

Richard Shick ◽

Mark Zaporowski

Keyword(s):

Economic Variables ◽

Bond Ratings ◽

Municipal Governments ◽

Bond Rating ◽

Factors Affecting ◽

Economic Health ◽

Economic Developments ◽

The Cost ◽

The Impact ◽

Do So

Creditworthiness, as reflected in bond ratings, is of great interest to municipalities since it directly affects the cost and ability to borrow money. Municipalities experiencing a decline in their economic health will be especially concerned about how these developments will impact their future bond ratings. It is well known that municipal analysts closely monitor a community’s economic health since this has an important impact on creditworthiness. What is less well known however, are the economic variables that influence bond ratings. The purpose of this paper is to identify these economic variables and estimate to what extent they influence the probability of a municipality’s default. We do so by developing an econometric model of the rating process. The model will allow municipal governments to gauge the impact of economic developments on their credit ratings.

Download Full-text

Comparison of Climate Reanalysis and Remote-Sensing Data for Predicting Olive Phenology through Machine-Learning Methods

Remote Sensing ◽

10.3390/rs13061224 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1224

Author(s):

Izar Azpiroz ◽

Noelia Oses ◽

Marco Quartulli ◽

Igor G. Olaizola ◽

Diego Guidotti ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Prediction Model ◽

Olive Tree ◽

Temperature Data ◽

Machine Learning Algorithms ◽

Environmental Data ◽

Climate Data ◽

Tree Phenology ◽

The Impact

Machine-learning algorithms used for modelling olive-tree phenology generally and largely rely on temperature data. In this study, we developed a prediction model on the basis of climate data and geophysical information. Remote measurements of weather conditions, terrain slope, and surface spectral reflectance were considered for this purpose. The accuracy of the temperature data worsened when replacing weather-station measurements with remote-sensing records, though the addition of more complete environmental data resulted in an efficient prediction model of olive-tree phenology. Filtering and embedded feature-selection techniques were employed to analyze the impact of variables on olive-tree phenology prediction, facilitating the inclusion of measurable information in decision support frameworks for the sustainable management of olive-tree systems.

Download Full-text