scholarly journals Statistical analysis of small twitter data collection to identify dengue outbreaks

2020 ◽  
Author(s):  
Carlos Euzebio ◽  
Sidney Agy ◽  
Claudio Boldorini Jr. ◽  
Lucas Porto ◽  
José Renato Alcarás ◽  
...  

This study presents an algorithmic strategy to analyze a small set of social network information to monitor the dengue disease. Previous studies have achieved similar results based on large datasets of Twitter microblogs. In this study, we successfully map dengue cases using a small data collection of tweets from a medium-size city. A set of modules were constructed to collect, categorize, and display dengue-related tweets. We compared the collected tweets with real data from confirmed dengue cases. We showed a significant correlation between the number of confirmed dengue cases and the number of dengue-related tweets, even considering such a small dataset. The results of this approach may be relevant in public health policies.

Author(s):  
Chris Drummond ◽  
Craig R. Davison

Producing compressor maps is time consuming, costly and error prone and many data samples must be collected to give sufficient accuracy. Even then, expert input is typically required to fine tune the map to the appropriate shape. In this paper, we take some of that expertise and incorporate it in the smoothing process. The main piece of knowledge used is the cubic approximation for speed lines derived from the Moore Greitzer model. This well accepted approximation captures much of the general performance properties of compressors. But it is also widely recognized as only being very roughly true of real compressors. Nevertheless, we show that embedding this approximation, however limited, in the smoothing process results in accurate interpolation and extrapolation. The aim of this work is to substantially reduce the need for human input in the fitting process. We also anticipate a number of other benefits: less data is needed, with the commensurate time and money saved; the data collection process can be monitored for possible problems; changes in the map can be quantified and, when sufficiently small, data collection can be terminated.


2018 ◽  
Author(s):  
Adrian Fritz ◽  
Peter Hofmann ◽  
Stephan Majda ◽  
Eik Dahms ◽  
Johannes Dröge ◽  
...  

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Here, we describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series and differential abundance studies, includes real and simulated strain-level diversity, and generates second and third generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with truth standards for method evaluation. All data sets and the software are freely available at: https://github.com/CAMI-challenge/CAMISIM


Author(s):  
Enzo Tartaglione ◽  
Carlo Alberto Barbano ◽  
Claudio Berzovini ◽  
Marco Calandri ◽  
Marco Grangetto

The possibility to use widespread and simple chest X-ray (CXR) imaging for early screening of COVID-19 patients is attracting much interest from both the clinical and the AI community. In this study we provide insights and also raise warnings on what is reasonable to expect by applying deep learning to COVID classification of CXR images. We provide a methodological guide and critical reading of an extensive set of statistical results that can be obtained using currently available datasets. In particular, we take the challenge posed by current small size COVID data and show how significant can be the bias introduced by transfer-learning using larger public non-COVID CXR datasets. We also contribute by providing results on a medium size COVID CXR dataset, just collected by one of the major emergency hospitals in Northern Italy during the peak of the COVID pandemic. These novel data allow us to contribute to validate the generalization capacity of preliminary results circulating in the scientific community. Our conclusions shed some light into the possibility to effectively discriminate COVID using CXR.


2010 ◽  
Vol 9 (2) ◽  
pp. 237-259 ◽  
Author(s):  
Tatjana Đurović ◽  
Nadežda Silaški

This paper looks at how the marriage metaphor structures the discourse concerning the relationship between political parties in Serbia. In January 2007, in the first general election to be held in Serbia since its union with Montenegro was dissolved in 2006, no party succeeded in gaining an absolute majority. Eventually, after more than three months of coalition talks, the main pro-reform parties agreed to form a government: the conservative and moderately nationalist right-leaning Democratic Party of Serbia (DSS), together with the pro-Western Democratic Party (DS). Compiling a small data collection from the leading Serbian dailies and political weeklies we have tried to track the metaphors through highly argumentative discourse in regard to the formation of political coalitions and their break-up. The main aim of this study is to show how the metaphors may be mapped and used as a vehicle of public discourse for achieving overt or covert political and ideological objectives on the complex political scene in contemporary Serbia. We will also argue that Serbian political discourse is highly gendered, as gender roles, manifested through the assignment of wife and husband roles to political parties, are clearly delineated according to the traditional male-female dichotomy, implying stereotypical traits and patriarchal values characteristic of Serbian culture.


Author(s):  
Chanintorn Jittawiriyanukoon

<span>The standard data collection problems may involve noiseless data while on the other hand large organizations commonly experience noisy and missing data, probably concerning data collected from individuals. As noisy and missing data will be significantly worrisome for occasions of the vast data collection then the investigation of different filtering techniques for big data environment would be remarkable. A multiple regression model where big data is employed for experimenting will be presented. Approximation for datasets with noisy and missing data is also proposed. The statistical root mean squared error (RMSE) associated with correlation coefficient (COEF) will be analyzed to prove the accuracy of estimators. Finally, results predicted by massive online analysis (MOA) will be compared to those real data collected from the following different time. These theoretical predictions with noisy and missing data estimation by simulation, revealing consistency with the real data are illustrated. Deletion mechanism (DEL) outperforms with the lowest average percentage of error.</span>


2020 ◽  
Vol 70 (2) ◽  
pp. 283-289
Author(s):  
D.R. Rakhimova ◽  
◽  
A.R. Satybaldiev ◽  

This work is devoted to the creation of a system for the automatic collection and processing of open data in Kazakh from Internet resources, and bears practical significance in the tasks of collecting and analyzing text. The introduction substantiates the relevance of the chosen topic, a review of existing approaches, formulates the objectives of the study. We consider such a problem as the collection and primary processing of text data with subsequent analysis. Data collection is a priority, since open data from Internet resources is not structured and needs to be processed. The authors provide a system for processing web pages of Kazakh-language portals, and also gives practical application of this approach to real data of open resources using the created system. The approach of indexing documents using features is presented. The system will help structure open data from Internet resources, as well as analyze collected data. Practical results are presented


2016 ◽  
Vol 15 ◽  
pp. 521
Author(s):  
Malena Storani Gonçalves Rosa ◽  
Ândrea Cardoso De Sousa

General  aim:  to  evaluate  whether  PET-Health  has  been  constituted as   a   possibility   with   regard  to   continuing   education   for   professionals/preceptors employed by the health services. Specific aims: to identify and characterize the actions of Pet-Health, recognized by preceptors as a form of continuing education. Method: This is  a  descriptive  and  evaluative  study using  a  qualitative approach, to  be  undertaken  in mental health services that make use of PET experience in Niterói/RJ. For data collection, semi-structured interviews will be conducted with professionals who act as PET-HEALTH preceptors  in  the  mental  health  network.  Information  processing  will  be  based  on content  analysis.  At  the  end  of  this  study,  benefits  pointing  to  the  power  that  shapes PET-Health with regard to continuing education are expected.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Liang Guo ◽  
Wendong Wang ◽  
Shiduan Cheng ◽  
Xirong Que

Weibo media, known as the real-time microblogging services, has attracted massive attention and support from social network users. Weibo platform offers an opportunity for people to access information and changes the way people acquire and disseminate information significantly. Meanwhile, it enables people to respond to the social events in a more convenient way. Much of the information in Weibo media is related to some events. Users who post different contents, and exert different behavior or attitude may lead to different contribution to the specific event. Therefore, classifying the large amount of uncategorized social circles generated in Weibo media automatically from the perspective of events has been a promising task. Under this circumstance, in order to effectively organize and manage the huge amounts of users, thereby further managing their contents, we address the task of user classification in a more granular, event-based approach in this paper. By analyzing real data collected from Sina Weibo, we investigate the Weibo properties and utilize both content information and social network information to classify the numerous users into four primary groups: celebrities, organizations/media accounts, grassroots stars, and ordinary individuals. The experiments results show that our method identifies the user categories accurately.


Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


Sign in / Sign up

Export Citation Format

Share Document