scholarly journals LOOK WHO’S TALKING: USING HUMAN CODING TO ESTABLISH A MACHINE LEARNING APPROACH TO TWITTER EDUCATION CHATS

Author(s):  
K. Bret Staudt Willet ◽  
Brooks D. Willet

Twitter has become a hub for many different types of educational conversations, denoted by hashtags and organized by a variety of affinities. Researchers have described these educational conversations on Twitter as sites for teacher professional development. Here, we studied #Edchat—one of the oldest and busiest Twitter educational hashtags—to examine the content of contributions for evidence of professional purposes. We collected tweets containing the text “#edchat” from October 1, 2017 to June 5, 2018, resulting in a dataset of 1,228,506 unique tweets from 196,263 different contributors. Through initial human-coded content analysis, we sorted a stratified random sample of 1,000 tweets into four inductive categories: tweets demonstrating evidence of different professional purposes related to (a) self, (b) others, (c) mutual engagement, and (d) everything else. We found 65% of the tweets in our #Edchat sample demonstrated purposes related to others, 25% demonstrated purposes related to self, and 4% of tweets demonstrated purposes related to mutual engagement. Our initial method was too time intensive—it would be untenable to collect tweets from 339 known Twitter education hashtags and conduct human-coded content analysis of each. Therefore, we are developing a scalable machine-learning model—a multiclass logistic regression classifier using an input matrix of features such as tweet types, keywords, sentiment, word count, hashtags, hyperlinks, and tweet metadata. The anticipated product of this research—a successful, generalizable machine learning model—would help educators and researchers quickly evaluate Twitter educational hashtags to determine where they might want to engage.

2019 ◽  
Author(s):  
Abdul Karim ◽  
Vahid Riahi ◽  
Avinash Mishra ◽  
Abdollah Dehzangi ◽  
M. A. Hakim Newton ◽  
...  

Abstract Representing molecules in the form of only one type of features and using those features to predict their activities is one of the most important approaches for machine-learning-based chemical-activity-prediction. For molecular activities like quantitative toxicity prediction, the performance depends on the type of features extracted and the machine learning approach used. For such cases, using one type of features and machine learning model restricts the prediction performance to specific representation and model used. In this paper, we study quantitative toxicity prediction and propose a machine learning model for the same. Our model uses an ensemble of heterogeneous predictors instead of typically using homogeneous predictors. The predictors that we use vary either on the type of features used or on the deep learning architecture employed. Each of these predictors presumably has its own strengths and weaknesses in terms of toxicity prediction. Our motivation is to make a combined model that utilizes different types of features and architectures to obtain better collective performance that could go beyond the performance of each individual predictor. We use six predictors in our model and test the model on four standard quantitative toxicity benchmark datasets. Experimental results show that our model outperforms the state-of-the-art toxicity prediction models in 8 out of 12 accuracy measures. Our experiments show that ensembling heterogeneous predictor improves the performance over single predictors and homogeneous ensembling of single predictors.The results show that each data representation or deep learning based predictor has its own strengths and weaknesses, thus employing a model ensembling multiple heterogeneous predictors could go beyond individual performance of each data representation or each predictor type.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2408-2411

Sales forecasting is widely recognized and plays a major role in an organization’s decision making. It is an integral part in business execution of retail giants, so that they can change their strategy to improve sales in the near future. This helps in better management of their resources like machine, money and manpower. Forecasting the sales will help in managing the revenue and inventory accordingly. This paper proposes a model that can forecast most profitable segments at granular level. As most retail giants have many branches in different locations, consolidation of sales are hard using data mining. Instead using machine learning model helps in getting reliable and accurate results. This paper helps in understanding the sales trend to monitor or predict future applicable on different types of sales patterns and products to produce accurate prediction results.


2020 ◽  
Vol 23 (4) ◽  
pp. 3233-3253 ◽  
Author(s):  
Rahim Taheri ◽  
Reza Javidan ◽  
Mohammad Shojafar ◽  
P. Vinod ◽  
Mauro Conti

Author(s):  
C. Selvi ◽  
R. Shalini ◽  
V. Navaneethan ◽  
L. Santhiya

An University’s reputation and its standard are weighted by its students performance and their part in the future economic prosperity of the nation, hence a novel method of predicting the student’s upcoming academic performance is really essential to provide a pre-requisite information upon their performances. A machine learning model can be developed to predict the student’s upcoming scores or their entire performance depending upon their previous academic performances.


2021 ◽  
Vol 13 (10) ◽  
pp. 5699
Author(s):  
Seung-Chul Noh ◽  
Jung-Ho Park

The small commercial stores opening in housing structures in Seoul have been soaring since the beginning of this century. While commercialization generally increases urban vitality and achieves land use mix, cafés and restaurants in low-rise residential areas may attract numerous passenger populations, with increased noise and crimes, in the residential area. The urban commercialization is so fast and prevalent that neither urban researchers nor policymakers can respond to it timely without a practical prediction tool. Focusing on cafés and restaurants, we propose an XGBoost machine learning model that can predict commercial store openings in urban residential areas and further play the role of an early warning system. Our findings highlight a large degree of difference in the predictor importance between the variables used in our machine learning model. The most important predictor relates to land price, indicating that economic motivation leads to the conversion of urban housing to small cafés and restaurants. The Mapo neighborhood is predicted to be the most prone to the commercialization of urban housing, therefore, its urgency to be prepared against expected commercialization deserves underscoring. Overall, our results show that the machine learning approach can be applied to predict changes in land uses and contribute to timely policy designs in rapidly changing urban context.


JMIR Cardio ◽  
10.2196/24473 ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. e24473
Author(s):  
Anietie U Andy ◽  
Sharath C Guntuku ◽  
Srinath Adusumalli ◽  
David A Asch ◽  
Peter W Groeneveld ◽  
...  

Background Current atherosclerotic cardiovascular disease (ASCVD) predictive models have limitations; thus, efforts are underway to improve the discriminatory power of ASCVD models. Objective We sought to evaluate the discriminatory power of social media posts to predict the 10-year risk for ASCVD as compared to that of pooled cohort risk equations (PCEs). Methods We consented patients receiving care in an urban academic emergency department to share access to their Facebook posts and electronic medical records (EMRs). We retrieved Facebook status updates up to 5 years prior to study enrollment for all consenting patients. We identified patients (N=181) without a prior history of coronary heart disease, an ASCVD score in their EMR, and more than 200 words in their Facebook posts. Using Facebook posts from these patients, we applied a machine-learning model to predict 10-year ASCVD risk scores. Using a machine-learning model and a psycholinguistic dictionary, Linguistic Inquiry and Word Count, we evaluated if language from posts alone could predict differences in risk scores and the association of certain words with risk categories, respectively. Results The machine-learning model predicted the 10-year ASCVD risk scores for the categories <5%, 5%-7.4%, 7.5%-9.9%, and ≥10% with area under the curve (AUC) values of 0.78, 0.57, 0.72, and 0.61, respectively. The machine-learning model distinguished between low risk (<10%) and high risk (>10%) with an AUC of 0.69. Additionally, the machine-learning model predicted the ASCVD risk score with Pearson r=0.26. Using Linguistic Inquiry and Word Count, patients with higher ASCVD scores were more likely to use words associated with sadness (r=0.32). Conclusions Language used on social media can provide insights about an individual’s ASCVD risk and inform approaches to risk modification.


2018 ◽  
Vol 20 (5) ◽  
pp. 1131-1147 ◽  
Author(s):  
N. Caradot ◽  
M. Riechel ◽  
M. Fesneau ◽  
N. Hernandez ◽  
A. Torres ◽  
...  

Abstract Deterioration models can be successfully deployed only if decision-makers trust the modelling outcomes and are aware of model uncertainties. Our study aims to address this issue by developing a set of clearly understandable metrics to assess the performance of sewer deterioration models from an end-user perspective. The developed metrics are used to benchmark the performance of a statistical model, namely, GompitZ based on survival analysis and Markov-chains, and a machine learning model, namely, Random Forest, an ensemble learning method based on decision trees. The models have been trained with the extensive CCTV dataset of the sewer network of Berlin, Germany (115,258 inspections). At network level, both models give satisfactory outcomes with deviations between predicted and inspected condition distributions below 5%. At pipe level, the statistical model does not perform better than a simple random model, which attributes randomly a condition class to each inspected pipe, whereas the machine learning model provides satisfying performance. 66.7% of the pipes inspected in bad condition have been predicted correctly. The machine learning approach shows a strong potential for supporting operators in the identification of pipes in critical condition for inspection programs whereas the statistical approach is more adapted to support strategic rehabilitation planning.


Author(s):  
Tong Wang ◽  
Cheng He ◽  
Fujie Jin ◽  
Yu Jeffrey Hu

We develop a novel interpretable machine learning model, GANNM, and use newly available data to evaluate how different types of marketing campaigns and budget allocations influence malls’ customer traffic. We observe that the response curves that measure the impact of campaign budget on customer traffic differ for different categories of campaigns, with sales incentives or experience incentives, during peak periods, off-peak periods, or online promotion periods. Based on such accurate response curves from GANNM, the optimized budget allocation is estimated to yield a 11.2% increase in customer traffic compared with the original allocation. Our findings provide novel insights on managing mall campaigns. Mall managers should increase marketing spending to areas that were likely overlooked before and avoid over-crowding budget to campaigns during times with high levels of competition and are likely already over-marketed. We provide empirical evidence showing that the recent trend of employing novel approaches for enhancing customer experience in physical stores can effectively encourage customers to visit malls. Furthermore, we show that online promotions could also create opportunities for offline businesses—investing in campaigns in the major online promotion periods could significantly increase customer traffic for malls, given sufficient investment in the campaigns to raise customer awareness.


Sign in / Sign up

Export Citation Format

Share Document