International Journal of Data Science and Analytics
Latest Publications


TOTAL DOCUMENTS

297
(FIVE YEARS 130)

H-INDEX

14
(FIVE YEARS 5)

Published By Springer-Verlag

2364-4168, 2364-415x

Author(s):  
A. Jasinska-Piadlo ◽  
R. Bond ◽  
P. Biglarbeigi ◽  
R. Brisk ◽  
P. Campbell ◽  
...  

AbstractThis paper presents a systematic literature review with respect to application of data science and machine learning (ML) to heart failure (HF) datasets with the intention of generating both a synthesis of relevant findings and a critical evaluation of approaches, applicability and accuracy in order to inform future work within this field. This paper has a particular intention to consider ways in which the low uptake of ML techniques within clinical practice could be resolved. Literature searches were performed on Scopus (2014-2021), ProQuest and Ovid MEDLINE databases (2014-2021). Search terms included ‘heart failure’ or ‘cardiomyopathy’ and ‘machine learning’, ‘data analytics’, ‘data mining’ or ‘data science’. 81 out of 1688 articles were included in the review. The majority of studies were retrospective cohort studies. The median size of the patient cohort across all studies was 1944 (min 46, max 93260). The largest patient samples were used in readmission prediction models with the median sample size of 5676 (min. 380, max. 93260). Machine learning methods focused on common HF problems: detection of HF from available dataset, prediction of hospital readmission following index hospitalization, mortality prediction, classification and clustering of HF cohorts into subgroups with distinctive features and response to HF treatment. The most common ML methods used were logistic regression, decision trees, random forest and support vector machines. Information on validation of models was scarce. Based on the authors’ affiliations, there was a median 3:1 ratio between IT specialists and clinicians. Over half of studies were co-authored by a collaboration of medical and IT specialists. Approximately 25% of papers were authored solely by IT specialists who did not seek clinical input in data interpretation. The application of ML to datasets, in particular clustering methods, enabled the development of classification models assisting in testing the outcomes of patients with HF. There is, however, a tendency to over-claim the potential usefulness of ML models for clinical practice. The next body of work that is required for this research discipline is the design of randomised controlled trials (RCTs) with the use of ML in an intervention arm in order to prospectively validate these algorithms for real-world clinical utility.


Author(s):  
Veronika Batzdorfer ◽  
Holger Steinmetz ◽  
Marco Biella ◽  
Meysam Alizadeh

AbstractThe COVID-19 pandemic resulted in an upsurge in the spread of diverse conspiracy theories (CTs) with real-life impact. However, the dynamics of user engagement remain under-researched. In the present study, we leverage Twitter data across 11 months in 2020 from the timelines of 109 CT posters and a comparison group (non-CT group) of equal size. Within this approach, we used word embeddings to distinguish non-CT content from CT-related content as well as analysed which element of CT content emerged in the pandemic. Subsequently, we applied time series analyses on the aggregate and individual level to investigate whether there is a difference between CT posters and non-CT posters in non-CT tweets as well as the temporal dynamics of CT tweets. In this regard, we provide a description of the aggregate and individual series, conducted a STL decomposition in trends, seasons, and errors, as well as an autocorrelation analysis, and applied generalised additive mixed models to analyse nonlinear trends and their differences across users. The narrative motifs, characterised by word embeddings, address pandemic-specific motifs alongside broader motifs and can be related to several psychological needs (epistemic, existential, or social). Overall, the comparison of the CT group and non-CT group showed a substantially higher level of overall COVID-19-related tweets in the non-CT group and higher level of random fluctuations. Focussing on conspiracy tweets, we found a slight positive trend but, more importantly, an increase in users in 2020. Moreover, the aggregate series of CT content revealed two breaks in 2020 and a significant albeit weak positive trend since June. On the individual level, the series showed strong differences in temporal dynamics and a high degree of randomness and day-specific sensitivity. The results stress the importance of Twitter as a means of communication during the pandemic and illustrate that these beliefs travel very fast and are quickly endorsed.


Author(s):  
Harald Stiff ◽  
Fredrik Johansson

AbstractModern neural language models can be used by malicious actors to automatically produce textual content looking as it has been written by genuine human users. Due to progress in the controllability of computer-generated text, there is a risk that state-sponsored actors may start using such methods for conducting large-scale information operations. Various detection algorithms have been suggested in the research literature to identify texts produced by language model-based generators, but these are often mainly evaluated on test data from the same distribution as they have been trained on. We evaluate promising Transformer-based detection algorithms in a large variety of experiments involving both in-distribution and out-of-distribution test data, as well as evaluation on more realistic in-the-wild data. It is shown that the generalizability of the detectors can be questioned, especially when applied to short social media posts. Moreover, the best performing (RoBERTa-based) detector is shown to be non-robust also to basic adversarial attacks, illustrating how easy it is for malicious actors to avoid detection by the current state-of-the-art detection algorithms.


Author(s):  
A. Srinivas Reddy ◽  
P. Krishna Reddy ◽  
Anirban Mondal ◽  
U. Deva Priyakumar

Sign in / Sign up

Export Citation Format

Share Document