Nonprofit Role Classification Using Mission Descriptions and Supervised Machine Learning

2021 ◽  
pp. 089976402110573
Author(s):  
Megan LePere-Schloop

Scholars have used both quantitative and qualitative approaches to empirically study nonprofit roles. Mission statements and program descriptions often reflect such roles, however, until recently collecting and classifying a large sample has been labor-intensive. This research note uses data on United Ways that e-filed their 990 forms and supervised machine learning to illustrate an approach for classifying a large set of mission descriptions by roles. Temporal and geographic variation in roles detected in mission statements suggests that such an approach may be fruitful in future research.

Author(s):  
A. B.M. Shawkat Ali

From the beginning, machine learning methodology, which is the origin of artificial intelligence, has been rapidly spreading in the different research communities with successful outcomes. This chapter aims to introduce for system analysers and designers a comparatively new statistical supervised machine learning algorithm called support vector machine (SVM). We explain two useful areas of SVM, that is, classification and regression, with basic mathematical formulation and simple demonstration to make easy the understanding of SVM. Prospects and challenges of future research in this emerging area are also described. Future research of SVM will provide improved and quality access to the users. Therefore, developing an automated SVM system with state-of-the-art technologies is of paramount importance, and hence, this chapter will link up an important step in the system analysis and design perspective to this evolving research arena.


2021 ◽  
Vol 13 (7) ◽  
pp. 1341
Author(s):  
Simon Appeltans ◽  
Jan G. Pieters ◽  
Abdul M. Mouazen

Rust disease is an important problem for leek cultivation worldwide. It reduces market value and in extreme cases destroys the entire harvest. Farmers have to resort to periodical full-field fungicide applications to prevent the spread of disease, once every 1 to 5 weeks, depending on the cultivar and weather conditions. This implies an economic cost for the farmer and an environmental cost for society. Hyperspectral sensors have been extensively used to address this issue in research, but their application in the field has been limited to a relatively low number of crops, excluding leek, due to the high investment costs and complex data gathering and analysis associated with these sensors. To fill this gap, a methodology was developed for detecting leek rust disease using hyperspectral proximal sensing data combined with supervised machine learning. First, a hyperspectral library was constructed containing 43,416 spectra with a waveband range of 400–1000 nm, measured under field conditions. Then, an extensive evaluation of 11 common classifiers was performed using the scikit-learn machine learning library in Python, combined with a variety of wavelength selection techniques and preprocessing strategies. The best performing model was a (linear) logistic regression model that was able to correctly classify rust disease with an accuracy of 98.14 %, using reflectance values at 556 and 661 nm, combined with the value of the first derivative at 511 nm. This model was used to classify unlabelled hyperspectral images, confirming that the model was able to accurately classify leek rust disease symptoms. It can be concluded that the results in this work are an important step towards the mapping of leek rust disease, and that future research is needed to overcome certain challenges before variable rate fungicide applications can be adopted against leek rust disease.


2019 ◽  
Vol 23 (1) ◽  
pp. 52-71 ◽  
Author(s):  
Siyoung Chung ◽  
Mark Chong ◽  
Jie Sheng Chua ◽  
Jin Cheon Na

PurposeThe purpose of this paper is to investigate the evolution of online sentiments toward a company (i.e. Chipotle) during a crisis, and the effects of corporate apology on those sentiments.Design/methodology/approachUsing a very large data set of tweets (i.e. over 2.6m) about Company A’s food poisoning case (2015–2016). This case was selected because it is widely known, drew attention from various stakeholders and had many dynamics (e.g. multiple outbreaks, and across different locations). This study employed a supervised machine learning approach. Its sentiment polarity classification and relevance classification consisted of five steps: sampling, labeling, tokenization, augmentation of semantic representation, and the training of supervised classifiers for relevance and sentiment prediction.FindingsThe findings show that: the overall sentiment of tweets specific to the crisis was neutral; promotions and marketing communication may not be effective in converting negative sentiments to positive sentiments; a corporate crisis drew public attention and sparked public discussion on social media; while corporate apologies had a positive effect on sentiments, the effect did not last long, as the apologies did not remove public concerns about food safety; and some Twitter users exerted a significant influence on online sentiments through their popular tweets, which were heavily retweeted among Twitter users.Research limitations/implicationsEven with multiple training sessions and the use of a voting procedure (i.e. when there was a discrepancy in the coding of a tweet), there were some tweets that could not be accurately coded for sentiment. Aspect-based sentiment analysis and deep learning algorithms can be used to address this limitation in future research. This analysis of the impact of Chipotle’s apologies on sentiment did not test for a direct relationship. Future research could use manual coding to include only specific responses to the corporate apology. There was a delay between the time social media users received the news and the time they responded to it. Time delay poses a challenge to the sentiment analysis of Twitter data, as it is difficult to interpret which peak corresponds with which incident/s. This study focused solely on Twitter, which is just one of several social media sites that had content about the crisis.Practical implicationsFirst, companies should use social media as official corporate news channels and frequently update them with any developments about the crisis, and use them proactively. Second, companies in crisis should refrain from marketing efforts. Instead, they should focus on resolving the issue at hand and not attempt to regain a favorable relationship with stakeholders right away. Third, companies can leverage video, images and humor, as well as individuals with large online social networks to increase the reach and diffusion of their messages.Originality/valueThis study is among the first to empirically investigate the dynamics of corporate reputation as it evolves during a crisis as well as the effects of corporate apology on online sentiments. It is also one of the few studies that employs sentiment analysis using a supervised machine learning method in the area of corporate reputation and communication management. In addition, it offers valuable insights to both researchers and practitioners who wish to utilize big data to understand the online perceptions and behaviors of stakeholders during a corporate crisis.


2018 ◽  
Author(s):  
John Wallert ◽  
Emelie Gustafson ◽  
Claes Held ◽  
Guy Madison ◽  
Fredrika Norlund ◽  
...  

BACKGROUND BACKGROUND: Low adherence to recommended treatments is a multifactorial problem in rehabilitation for patients with myocardial infarction (MI). In a nationwide trial of internet-delivered cognitive behavior therapy (iCBT) for the high-risk subgroup of patients with MI also reporting symptoms of anxiety, depression, or both (MI-ANXDEP), adherence was low. Since low adherence to psychotherapy leads to a waste of therapeutic resources and risky treatment abortion in MI-ANXDEP patients, identifying early predictors for adherence is potentially valuable for effective targeted care. OBJECTIVE Applied predictive modelling with supervised machine learning to investigate both established and novel predictors for iCBT adherence in MI-ANXDEP patients. METHODS Data were from 90 MI-ANXDEP patients recruited from 25 hospitals in Sweden and randomized to treatment in the iCBT trial U-CARE Heart. Time-point of prediction was at completion of the first homework assignment. Adherence was defined as having completed at least the first two homework assignments within the 14-week treatment period. A supervised machine learning procedure was applied to identify the most potent predictors for adherence available at the first treatment session from a range of demographic, clinical, psychometric, and linguistic predictors. The internal binary classifier was a random forest model within a 3x10-fold cross-validated recursive feature elimination (RFE) resampling, which selected the final predictor subset which best differentiated adherers versus non-adherers. RESULTS RESULTS: Patient mean age was 58.4 (9.4) years, 62% (56/90) were men, and 48% (43/90) were adherent. Out of the 34 potential predictors for adherence, RFE selected an optimal subset of 56% (19/34) (Accuracy 0.64, 95% CI 0.61-0.68, P < 0.01). The strongest predictors for adherence were in order of importance (1) self-assessed cardiac-related fear, (2) sex, and (3) the number of words the patient used to answer the first homework assignment. CONCLUSIONS CONCLUSIONS: Adherence to iCBT for MI-ANXDEP patients was best predicted by cardiac-related fear and sex, consistent with previous research, but also by novel linguistic predictors from written patient behavior which conceivably indicate verbal ability or therapeutic alliance. Future research should investigate potential causal mechanisms, seek to determine what underlying constructs the linguistic predictors tap into, and whether these findings replicate for other interventions, outside of Sweden, in larger samples, and for patients with other conditions whom are offered iCBT. CLINICALTRIAL TRIAL REGISTRATION: ClinicalTrials.gov NCT01504191; https://clinicaltrials.gov/ct2/show/NCT01504191 (Archived at Webcite at http://www.webcitation.org/6xWWSEQ22)


2021 ◽  
Vol 11 (18) ◽  
pp. 8293
Author(s):  
Federico Gargiulo ◽  
Dirk Duellmann ◽  
Pasquale Arpaia ◽  
Rosario Schiano Lo Moriello

Today, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resource demands. In large-scale computer centers, the storage system requires particular attention and investment; usually, a large number of diverse storage devices need to be deployed in order to match the varying performance and volume requirements of changing user applications. As of today, magnetic drives still play a dominant role in terms of deployed storage volume and of service outages due to device failure. In this paper, we study methods to facilitate automated proactive disk replacement. We propose a method to identify disks with media failures in a production environment and describe an application of supervised machine learning to predict disk failures. In particular, a proper stage to automatically label (healthy/at-risk) the disks during the training and validation stage is presented along with tuning strategy to optimize the hyperparameters of the associated machine learning classifier. The approach is trained and validated against a large set of 65,000 hard drives in the CERN computer center, and the achieved results are discussed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Satyaki Roy ◽  
Shehzad Z. Sheikh ◽  
Terrence S. Furey

AbstractInflammatory bowel diseases (IBD), namely Crohn’s disease (CD) and ulcerative colitis (UC) are chronic inflammation within the gastrointestinal tract. IBD patient conditions and treatments, such as with immunosuppressants, may result in a higher risk of viral and bacterial infection and more severe outcomes of infections. The effect of the clinical and demographic factors on the prognosis of COVID-19 among IBD patients is still a significant area of investigation. The lack of available data on a large set of COVID-19 infected IBD patients has hindered progress. To circumvent this lack of large patient data, we present a random sampling approach to generate clinical COVID-19 outcomes (outpatient management, hospitalized and recovered, and hospitalized and deceased) on 20,000 IBD patients modeled on reported summary statistics obtained from the Surveillance Epidemiology of Coronavirus Under Research Exclusion (SECURE-IBD), an international database to monitor and report on outcomes of COVID-19 occurring in IBD patients. We apply machine learning approaches to perform a comprehensive analysis of the primary and secondary covariates to predict COVID-19 outcome in IBD patients. Our analysis reveals that age, medication usage and the number of comorbidities are the primary covariates, while IBD severity, smoking history, gender and IBD subtype (CD or UC) are key secondary features. In particular, elderly male patients with ulcerative colitis, several preexisting conditions, and who smoke comprise a highly vulnerable IBD population. Moreover, treatment with 5-ASAs (sulfasalazine/mesalamine) shows a high association with COVID-19/IBD mortality. Supervised machine learning that considers age, number of comorbidities and medication usage can predict COVID-19/IBD outcomes with approximately 70% accuracy. We explore the challenge of drawing demographic inferences from existing COVID-19/IBD data. Overall, there are fewer IBD case reports from US states with poor health ranking hindering these analyses. Generation of patient characteristics based on known summary statistics allows for increased power to detect IBD factors leading to variable COVID-19 outcomes. There is under-reporting of COVID-19 in IBD patients from US states with poor health ranking, underpinning the perils of using the repository to derive demographic information.


2021 ◽  
Author(s):  
Daniela A. Gomez-Cravioto ◽  
Ramon E. Diaz-Ramos ◽  
Neil Hernandez Gress ◽  
Jose Luis Preciado ◽  
Hector G. Ceballos

Abstract Background: This paper explores different machine learning algorithms and approaches for predicting alum income to obtain insights on the strongest predictors for income and a ‘high’ earners’ class. Methods: The study examines the alum sample data obtained from a survey from Tecnologico de Monterrey, a multicampus Mexican private university, and analyses it within the cross-industry standard process for data mining. Survey results include 17,898 and 12,275 observations before and after cleaning and pre-processing, respectively. The dataset includes values for income and a large set of independent variables, including demographic and occupational attributes of the former students and academic attributes from the institution’s history. We conduct an in-depth analysis to determine whether the accuracy of traditional algorithms in econometric research to predict income can be improved with a data science approach. Furthermore, we present insights on patterns obtained using explainable artificial intelligence techniques. Results: Results show that the gradient boosting model outperformed the parametric models, linear and logistic regression, in predicting alum’s current income with statistically significant results (p < 0.05) in three tasks: ordinary least-squares regression, multi-class classification and binary classification. Moreover, the linear and logistic regression models were found to be the most accurate methods for predicting the alum’s first income. The non-parametric models showed no significant improvements. Conclusion: We identified that age, gender, working hours per week, first income after graduation and variables related to the alum’s job position and firm contributed to explaining their income. Findings indicated a gender wage gap, suggesting that further work is needed to enable equality.


2021 ◽  
Vol 5 (4) ◽  
pp. 77
Author(s):  
Asra Fatima ◽  
Ying Li ◽  
Thomas Trenholm Hills ◽  
Massimo Stella

Most current affect scales and sentiment analysis on written text focus on quantifying valence/sentiment, the primary dimension of emotion. Distinguishing broader, more complex negative emotions of similar valence is key to evaluating mental health. We propose a semi-supervised machine learning model, DASentimental, to extract depression, anxiety, and stress from written text. We trained DASentimental to identify how N = 200 sequences of recalled emotional words correlate with recallers’ depression, anxiety, and stress from the Depression Anxiety Stress Scale (DASS-21). Using cognitive network science, we modeled every recall list as a bag-of-words (BOW) vector and as a walk over a network representation of semantic memory—in this case, free associations. This weights BOW entries according to their centrality (degree) in semantic memory and informs recalls using semantic network distances, thus embedding recalls in a cognitive representation. This embedding translated into state-of-the-art, cross-validated predictions for depression (R = 0.7), anxiety (R = 0.44), and stress (R = 0.52), equivalent to previous results employing additional human data. Powered by a multilayer perceptron neural network, DASentimental opens the door to probing the semantic organizations of emotional distress. We found that semantic distances between recalls (i.e., walk coverage), was key for estimating depression levels but redundant for anxiety and stress levels. Semantic distances from “fear” boosted anxiety predictions but were redundant when the “sad–happy” dyad was considered. We applied DASentimental to a clinical dataset of 142 suicide notes and found that the predicted depression and anxiety levels (high/low) corresponded to differences in valence and arousal as expected from a circumplex model of affect. We discuss key directions for future research enabled by artificial intelligence detecting stress, anxiety, and depression in texts.


2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


Sign in / Sign up

Export Citation Format

Share Document