COVID-19-related conspiracy beliefs and their relationship with perceived stress and pre-existing conspiracy beliefs in a Prolific Academic sample: A replication and extension of Georgiou et al. (2020)

The authors conducted a close replication of a study by Georgiou et al (2020), who found amongst 660 (reported in abstract) or 640 (reported in participant section) participants that 1) Covid-19 related conspiracy theory beliefs were strongly related to broader conspiracy theory beliefs, that 2) Covid-19 related conspiracy beliefs were higher in those with lower levels of education, and that 3) Covid-19 related conspiracy beliefs were positively (although weakly) correlated with more negative attitudes towards different individual items measuring the government’s response. Finally, they find that 4) Covid-19 beliefs were unrelated to self-reported stress. In a pre-registered replication and extension in a study sufficiently well-powered to detect f2 = 0.05, at an alpha level of .05, with an a priori power of .95, and with 5 Predictors in a multiple regression analysis, we do not find the same results. First, we find that education level is unrelated to Covid-19 related conspiracy beliefs, that stress is related to Covid-19 related conspiracy beliefs, but that the government’s response is indeed related to Covid-19 related conspiracy beliefs. We point out measurement problems in measuring conspiracy beliefs, extend the study through supervised machine learning by finding that attachment avoidance and anxiety are important predictors of conspiracy beliefs (Covid-19-related and beyond). Part of the differences between their and our study are likely due to differences in analysis approach; others may be due to the errors in Georgiou et al.’s (2020 reporting.

Download Full-text

On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t

Genes ◽

10.3390/genes12040527 ◽

2021 ◽

Vol 12 (4) ◽

pp. 527

Author(s):

Eran Elhaik ◽

Dan Graur

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

A Priori ◽

Neutral Theory ◽

Dominant Mode ◽

Supervised Machine Learning ◽

Training Dataset ◽

Selective Sweeps ◽

Two Factors ◽

Negative Controls

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.

Download Full-text

Text Classification and Tagging of United States Army Ground Vehicle Fault Descriptions in Support of Data-Driven Prognostics

Annual Conference of the PHM Society ◽

10.36001/phmconf.2020.v12i1.1154 ◽

2020 ◽

Vol 12 (1) ◽

pp. 8

Author(s):

Brandon Hansen ◽

Cody Coleman ◽

Yi Zhang ◽

Maria Seale

Keyword(s):

Machine Learning ◽

Natural Language ◽

A Priori ◽

Supervised Machine Learning ◽

Data Driven ◽

A Priori Knowledge ◽

Data Sets ◽

Ground Vehicle ◽

Us Army ◽

Priori Knowledge

The manner in which a prognostics problem is framed is critical for enabling its solution by the proper method. Recently, data-driven prognostics techniques have demonstrated enormous potential when used alone, or as part of a hybrid solution in conjunction with physics-based models. Historical maintenance data constitutes a critical element for the use of a data-driven approach to prognostics, such as supervised machine learning. The historical data is used to create training and testing data sets to develop the machine learning model. Categorical classes for prediction are required for machine learning methods; however, faults of interest in US Army Ground Vehicle Maintenance Records appear as natural language text descriptions rather than a finite set of discrete labels. Transforming linguistically complex data into a set of prognostics classes is necessary for utilizing supervised machine learning approaches for prognostics. Manually labeling fault description instances is effective, but extremely time-consuming; thus, an automated approach to labelling is preferred. The approach described in this paper examines key aspects of the fault text relevant to enabling automatic labeling. A method was developed based on the hypothesis that a given fault description could be generalized into a category. This method uses various natural language processing (NLP) techniques and a priori knowledge of ground vehicle faults to assign classes to the maintenance fault descriptions. The core component of the method used in this paper is a Word2Vec word-embedding model. Word embeddings are used in conjunction with a token-oriented rule-based data structure for document classification. This methodology tags text with user-provided classes using a corpus of similar text fields as its training set. With classes of faults reliably assigned to a given description, supervised machine learning with these classes can be applied using related maintenance information that preceded the fault. This method was developed for labeling US Army Ground Vehicle Maintenance Records, but is general enough to be applied to any natural language data sets accompanied with a priori knowledge of its contents for consistent labeling. In addition to applications in machine learning, generated labels are also conducive to general summarization and case-by-case analysis of faults. The maintenance components of interest for this current application are alternators and gaskets, with future development directed towards determining the RUL of these components based on the labeled data.

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

10.1101/134627 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text

On the Inapplicability of Supervised Machine Learning to Evolutionary Studies

10.20944/preprints202012.0214.v1 ◽

2020 ◽

Author(s):

Eran Elhaik ◽

Dan Graur

Keyword(s):

Machine Learning ◽

A Priori ◽

Supervised Machine Learning ◽

Training Dataset ◽

The Bible ◽

Human Genomes ◽

Evolutionary Studies ◽

Two Factors ◽

Negative Controls

Supervised machine learning (SML) is a powerful method for predicting a small number of well-defined output groups (e.g., potential buyers of a certain product) by taking as input a large number of known well-defined measurements (e.g., past purchases, income, ethnicity, gender, credit record, age, favorite color, favorite chewing gum). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known to be true. SML has had enormous success in the world of commerce, and this success has prompted a few scientists to employ it in the study of molecular and genome evolution. Here, we list the properties of SML that make it an unsuitable tool in evolutionary studies. In particular, we argue that SML cannot be used in an evolutionary exploratory context for the simple reason that training datasets that are known to be a priori true do not exist. As a case study, we use an SML study in which it was concluded that most human genomes evolve by positive selection through soft selective sweeps (Schrider and Kern 2017). We show that in the absence of legitimate training datasets, Schrider and Kern (2017) used (1) simulations that employ many manipulatable variables and (2) a system of cherry-picking data that would put to shame most modern evangelical exegeses of the Bible. These two factors, in addition to the lack of methodological detail and the lack of either negative controls or corrections for multiple comparisons, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., discoal) should be taken with a huge shovel of salt.

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

Molecular Biology of the Cell ◽

10.1091/mbc.e17-05-0333 ◽

2017 ◽

Vol 28 (23) ◽

pp. 3428-3436 ◽

Cited By ~ 34

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

Supervised machine learning is a powerful and widely used method for analyzing high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text

On the Inapplicability of Supervised Machine Learning to Studying the Driving Forces of Evolution

10.20944/preprints202012.0214.v2 ◽

2021 ◽

Author(s):

Eran Elhaik ◽

Dan Graur

Keyword(s):

Machine Learning ◽

Driving Forces ◽

A Priori ◽

Supervised Machine Learning ◽

Training Dataset ◽

The Bible ◽

Human Genomes ◽

Two Factors ◽

Negative Controls

Supervised machine learning (SML) is a powerful method for predicting a small number of well-defined output groups (e.g., potential buyers of a certain product) by taking as input a large number of known well-defined measurements (e.g., past purchases, income, ethnicity, gender, credit record, age, favorite color, favorite chewing gum). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known to be true. SML has had enormous success in the world of commerce, and this success may have prompted a few scientists to employ it in the study of molecular and genome evolution. Here, we list the properties of SML that make it an unsuitable tool in certain evolutionary studies. In particular, we argue that SML cannot be used in an evolutionary exploratory context for the simple reason that training datasets that are known to be a priori true do not exist. As a case study, we use an SML study in which it was concluded that most human genomes evolve by positive selection through soft selective sweeps (Schrider and Kern 2017). We show that in the absence of legitimate training datasets, Schrider and Kern (2017) used (1) simulations that employ many manipulatable variables and (2) a system of cherry-picking data that would put to shame most modern evangelical exegeses of the Bible. These two factors, in addition to the lack of methodological detail and negative controls, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S-HIC) should be taken with a huge shovel of salt.

Download Full-text

Magnetic Resonance Imaging Sequence Identification Using a Metadata Learning Approach

Frontiers in Neuroinformatics ◽

10.3389/fninf.2021.622951 ◽

2021 ◽

Vol 15 ◽

Author(s):

Shuai Liang ◽

Derek Beaton ◽

Stephen R. Arnott ◽

Tom Gee ◽

Mojdeh Zamyadi ◽

...

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Magnetic Resonance ◽

A Priori ◽

Supervised Machine Learning ◽

Learning Approaches ◽

Resonance Imaging ◽

Sequence Detection ◽

Mri Sequences ◽

Sequence Types

Despite the wide application of the magnetic resonance imaging (MRI) technique, there are no widely used standards on naming and describing MRI sequences. The absence of consistent naming conventions presents a major challenge in automating image processing since most MRI software require a priori knowledge of the type of the MRI sequences to be processed. This issue becomes increasingly critical with the current efforts toward open-sharing of MRI data in the neuroscience community. This manuscript reports an MRI sequence detection method using imaging metadata and a supervised machine learning technique. Three datasets from the Brain Center for Ontario Data Exploration (Brain-CODE) data platform, each involving MRI data from multiple research institutes, are used to build and test our model. The preliminary results show that a random forest model can be trained to accurately identify MRI sequence types, and to recognize MRI scans that do not belong to any of the known sequence types. Therefore the proposed approach can be used to automate processing of MRI data that involves a large number of variations in sequence names, and to help standardize sequence naming in ongoing data collections. This study highlights the potential of the machine learning approaches in helping manage health data.

Download Full-text

ELeFHAnt: A supervised machine learning approach for label harmonization and annotation of single cell RNA-seq data

10.1101/2021.09.07.459342 ◽

2021 ◽

Author(s):

Konrad Thorner ◽

Aaron M. Zorn ◽

Praneet Chaturvedi

Keyword(s):

Machine Learning ◽

Single Cell ◽

Single Cell Analysis ◽

Single Cells ◽

A Priori ◽

Cell Types ◽

R Package ◽

Supervised Machine Learning ◽

Support Vector ◽

Rna Seq

AbstractAnnotation of single cells has become an important step in the single cell analysis framework. With advances in sequencing technology thousands to millions of cells can be processed to understand the intricacies of the biological system in question. Annotation through manual curation of markers based on a priori knowledge is cumbersome given this exponential growth. There are currently ~200 computational tools available to help researchers automatically annotate single cells using supervised/unsupervised machine learning, cell type markers, or tissue-based markers from bulk RNA-seq. But with the expansion of publicly available data there is also a need for a tool which can help integrate multiple references into a unified atlas and understand how annotations between datasets compare. Here we present ELeFHAnt: Ensemble learning for harmonization and annotation of single cells. ELeFHAnt is an easy-to-use R package that employs support vector machine and random forest algorithms together to perform three main functions: 1) CelltypeAnnotation 2) LabelHarmonization 3) DeduceRelationship. CelltypeAnnotation is a function to annotate cells in a query Seurat object using a reference Seurat object with annotated cell types. LabelHarmonization can be utilized to integrate multiple cell atlases (references) into a unified cellular atlas with harmonized cell types. Finally, DeduceRelationship is a function that compares cell types between two scRNA-seq datasets. ELeFHAnt can be accessed from GitHub at https://github.com/praneet1988/ELeFHAnt.

Download Full-text

Resolving Protein Conformational Plasticity and Substrate Binding Through the Lens of Machine-Learning

10.1101/2022.01.07.475334 ◽

2022 ◽

Author(s):

Navjeet Ahalawat ◽

Jagannath Mondal

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Protein Conformation ◽

A Priori ◽

Specific Protein ◽

Dynamics Simulation ◽

Supervised Machine Learning ◽

Receptor Protein ◽

Recognition Process ◽

Conformational Plasticity

A long-standing target in elucidating the biomolecular recognition process is the identification of binding-competent conformations of the receptor protein. However, protein conformational plasticity and the stochastic nature of the recognition processes often preclude the assignment of a specific protein conformation to an individual ligand-bound pose. In particular, we consider multi-microsecond long Molecular dynamics simulation trajectories of ligand recognition process in solvent-inaccessible cavity of two archtypal systems: L99A mutant of T4 Lysozyme and Cytochrome P450. We first show that if the substrate-recognition occurs via long-lived intermediate, the protein conformations can be automatically classified into substrate-bound and unbound state through an unsupervised dimensionality reduction technique. On the contrary, if the recognition process is mediated by selection of transient protein conformation by the ligand, a clear correspondence between protein conformation and binding-competent macrostates can only be established via a combination of supervised machine learning (ML) and unsupervised dimension reduction approach. In such scenario, we demonstrate that a priori random forest based supervised classification of the simulated trajectories recognition process would help characterize key amino-acid residue-pairs of the protein that are deemed sensitive for ligand binding. A subsequent unsupervised dimensional reduction via time-lagged independent component analysis of the selected residue-pairs would delineate a conformational landscape of protein which is able to demarcate ligand-bound pose from the unbound ones. As a key breakthrough, the ML-based protocol would identify distal protein locations which would be allosterically important for ligand binding and characterise their roles in recognition pathways.

Download Full-text

Exploring the Use of Machine Learning to Automate the Qualitative Coding of Church-related Tweets

Fieldwork in Religion ◽

10.1558/firn.40610 ◽

2020 ◽

Vol 14 (2) ◽

pp. 140-159

Author(s):

Anthony-Paul Cooper ◽

Emmanuel Awuni Kolog ◽

Erkki Sutinen

Keyword(s):

Machine Learning ◽

Online Community ◽

High Volume ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Social Media Data ◽

Twitter Data ◽

Resource Intensity ◽

Media Data ◽

Better Than

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.

Download Full-text