scholarly journals Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

2021 ◽  
Vol 11 (4) ◽  
pp. 1537-1554
Author(s):  
Chen Mo ◽  
Jingjing Yin ◽  
Isaac Chun-Hai Fung ◽  
Zion Tsz Ho Tse

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents.

Author(s):  
Mohamed Abdulkadir ◽  
Dongmei Yu ◽  
Lisa Osiecki ◽  
Robert A. King ◽  
Thomas V. Fernandez ◽  
...  

AbstractTourette syndrome (TS) is a neuropsychiatric disorder with involvement of genetic and environmental factors. We investigated genetic loci previously implicated in Tourette syndrome and associated disorders in interaction with pre- and perinatal adversity in relation to tic severity using a case-only (N = 518) design. We assessed 98 single-nucleotide polymorphisms (SNPs) selected from (I) top SNPs from genome-wide association studies (GWASs) of TS; (II) top SNPs from GWASs of obsessive–compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), and autism spectrum disorder (ASD); (III) SNPs previously implicated in candidate-gene studies of TS; (IV) SNPs previously implicated in OCD or ASD; and (V) tagging SNPs in neurotransmitter-related candidate genes. Linear regression models were used to examine the main effects of the SNPs on tic severity, and the interaction effect of these SNPs with a cumulative pre- and perinatal adversity score. Replication was sought for SNPs that met the threshold of significance (after correcting for multiple testing) in a replication sample (N = 678). One SNP (rs7123010), previously implicated in a TS meta-analysis, was significantly related to higher tic severity. We found a gene–environment interaction for rs6539267, another top TS GWAS SNP. These findings were not independently replicated. Our study highlights the future potential of TS GWAS top hits in gene–environment studies.


2021 ◽  
Vol 40 (1) ◽  
pp. 61-79
Author(s):  
Carmela Alcántara ◽  
Shakira F. Suglia ◽  
Irene Perez Ibarra ◽  
A. Louise Falzon ◽  
Elliot McCullough ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David Gagnon ◽  
Abderrahim Zeribi ◽  
Élise Douard ◽  
Valérie Courchesne ◽  
Borja Rodríguez-Herreros ◽  
...  

Abstract Background Language delay is one of the major referral criteria for an autism evaluation. Once an autism spectrum diagnosis is established, the language prognosis is among the main parental concerns. Early language regression (ELR) is observed by 10–50% of parents but its relevance to late language level and socio-communicative ability is uncertain. This study aimed to establish the predictive value of ELR on the progression of language development and socio-communicative outcomes to guide clinicians in addressing parents’ concerns at the time of diagnosis. Methods We used socio-communicative, language, and cognitive data of 2,047 autism spectrum participants from the Simons Simplex Collection, aged 4–18 years (mean = 9 years; SD = 3.6). Cox proportional hazard and logistic regression models were used to evaluate the effect of ELR on language milestones and the probability of using complex and flexible language, as defined by the choice of ADOS module at enrollment. Linear models were then used to evaluate the relationship of ELR and non-verbal IQ with socio-communicative and language levels. Results ELR is associated with earlier language milestones but delayed attainment of fluent, complex, and flexible language. However, this language outcome can be expected for almost all autistic children without intellectual disability at 18 years of age. It is mostly influenced by non-verbal IQ, not ELR. The language and socio-communicative level of participants with flexible language, as measured by the Vineland and ADOS socio-communicative subscales, was not affected by ELR. Limitations This study is based on a relatively coarse measure of ultimate language level and relies on retrospective reporting of early language milestones and ELR. It does not prospectively document the age at which language catches up, the relationship between ELR and other behavioral areas of regression, nor the effects of intervention. Conclusions For autistic individuals with ELR and a normal level of non-verbal intelligence, language development follows a “bayonet shape” trajectory: early first words followed by regression, a plateau with limited progress, and then language catch up.


2011 ◽  
Vol 19 (01) ◽  
pp. 113-125 ◽  
Author(s):  
LEJUN GONG ◽  
XIAO SUN ◽  
DONGKE JIANG ◽  
SHENGTAO GONG

Autism spectrum disorders (ASD) represent a group of developmental disorders with strong genetic underpinnings. To explore the genetic complexity of ASD, we developed AutMiner (), a public web-portal for the collection of genes linked to ASD, and the implementation of an autism-centre network. AutMiner extracts candidate genes associated with ASD using text mining from 9276 abstracts. Compared to other recent systems, gene entries are richer to provide a reference for clinical geneticists. AutMiner also constructs ASD-related network consisting of autism-gene network and gene-gene network. To the best of our knowledge, this is the first web example of ASD-related network. The major focus of AutMiner is to offer a valuable reference tool for clinical geneticists in establishing and implementing effective genetic screening programmes for those patients with ASD.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Christopher H. Arehart ◽  
Michael Z. David ◽  
Vanja Dukic

AbstractThe Morbidity and Mortality Weekly Reports of the U.S. Centers for Disease Control and Prevention document a raw proxy for counts of pertussis cases in the U.S., and the Project Tycho (PT) database provides an improved source of these weekly data. These data are limited because of reporting delays, variation in state-level surveillance practices, and changes over time in diagnosis methods. We aim to assess whether Google Trends (GT) search data track pertussis incidence relative to PT data and if sociodemographic characteristics explain some variation in the accuracy of state-level models. GT and PT data were used to construct auto-correlation corrected linear models for pertussis incidence in 2004–2011 for the entire U.S. and each individual state. The national model resulted in a moderate correlation (adjusted R2 = 0.2369, p < 0.05), and state models tracked PT data for some but not all states. Sociodemographic variables explained approximately 30% of the variation in performance of individual state-level models. The significant correlation between GT models and public health data suggests that GT is a potentially useful pertussis surveillance tool. However, the variable accuracy of this tool by state suggests GT surveillance cannot be applied in a uniform manner across geographic sub-regions.


2017 ◽  
Vol 9 (7) ◽  
pp. 1106 ◽  
Author(s):  
Amruta Nori-Sarma ◽  
Anobha Gurung ◽  
Gulrez Azhar ◽  
Ajit Rajiva ◽  
Dileep Mavalankar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document