scholarly journals Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study (Preprint)

2020 ◽  
Author(s):  
Tim Mackey ◽  
Vidya Purushothaman ◽  
Jiawei Li ◽  
Neal Shah ◽  
Matthew Nali ◽  
...  

BACKGROUND The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries. OBJECTIVE The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach. METHODS Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics. RESULTS A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States. CONCLUSIONS This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.

10.2196/19509 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e19509 ◽  
Author(s):  
Tim Mackey ◽  
Vidya Purushothaman ◽  
Jiawei Li ◽  
Neal Shah ◽  
Matthew Nali ◽  
...  

Background The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries. Objective The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach. Methods Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics. Results A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States. Conclusions This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


2021 ◽  
Vol 224 (2) ◽  
pp. S121-S122
Author(s):  
Ramamurthy Siripuram ◽  
Nathan R. Blue ◽  
Robert M. Silver ◽  
William A. Grobman ◽  
Uma M. Reddy ◽  
...  

BJS Open ◽  
2021 ◽  
Vol 5 (1) ◽  
Author(s):  
F Torresan ◽  
F Crimì ◽  
F Ceccato ◽  
F Zavan ◽  
M Barbot ◽  
...  

Abstract Background The main challenge in the management of indeterminate incidentally discovered adrenal tumours is to differentiate benign from malignant lesions. In the absence of clear signs of invasion or metastases, imaging techniques do not always precisely define the nature of the mass. The present pilot study aimed to determine whether radiomics may predict malignancy in adrenocortical tumours. Methods CT images in unenhanced, arterial, and venous phases from 19 patients who had undergone resection of adrenocortical tumours and a cohort who had undergone surveillance for at least 5 years for incidentalomas were reviewed. A volume of interest was drawn for each lesion using dedicated software, and, for each phase, first-order (histogram) and second-order (grey-level colour matrix and run-length matrix) radiological features were extracted. Data were revised by an unsupervised machine learning approach using the K-means clustering technique. Results Of operated patients, nine had non-functional adenoma and 10 carcinoma. There were 11 patients in the surveillance group. Two first-order features in unenhanced CT and one in arterial CT, and 14 second-order parameters in unenhanced and venous CT and 10 second-order features in arterial CT, were able to differentiate adrenocortical carcinoma from adenoma (P &lt; 0.050). After excluding two malignant outliers, the unsupervised machine learning approach correctly predicted malignancy in seven of eight adrenocortical carcinomas in all phases. Conclusion Radiomics with CT texture analysis was able to discriminate malignant from benign adrenocortical tumours, even by an unsupervised machine learning approach, in nearly all patients.


2020 ◽  
Author(s):  
Daniel Oluwadara Fadokun ◽  
Ishioma Bridget Oshilike ◽  
Mike Obi Onyekonwu

2003 ◽  
Vol 46 (17) ◽  
pp. 3631-3643 ◽  
Author(s):  
Dmitry Korolev ◽  
Konstantin V. Balakin ◽  
Yuri Nikolsky ◽  
Eugene Kirillov ◽  
Yan A. Ivanenkov ◽  
...  

2018 ◽  
Vol 853 (1) ◽  
pp. 90 ◽  
Author(s):  
Federico Benvenuto ◽  
Michele Piana ◽  
Cristina Campi ◽  
Anna Maria Massone

Sign in / Sign up

Export Citation Format

Share Document