scholarly journals Sensitive protein alignments at tree-of-life scale using DIAMOND

2021 ◽  
Vol 18 (4) ◽  
pp. 366-368
Author(s):  
Benjamin Buchfink ◽  
Klaus Reuter ◽  
Hajk-Georg Drost

AbstractWe are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.

2010 ◽  
Vol 37 (3) ◽  
pp. 705-729 ◽  
Author(s):  
KENJI SAGAE ◽  
ERIC DAVIS ◽  
ALON LAVIE ◽  
BRIAN MACWHINNEY ◽  
SHULY WINTNER

ABSTRACTCorpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes.


2021 ◽  
pp. 193229682110413
Author(s):  
Jeniece Ilkowitz ◽  
Vanessa Wissing ◽  
Mary Pat Gallagher

In the pediatric population, insulin pump therapy, or CSII, is often considered the gold standard for intensive diabetes management. Insulin pump technology offers families and caregivers many beneficial features including a calculator for insulin dosing and the ability to review diabetes management data to provide data-driven diabetes management. However, for those who find CSII challenging or choose to use multiple daily injections (MDI) there is an option that offers similar features called the Smart Insulin Pen (SIP). Even though SIP technology provides a safe and data-driven diabetes self-management tool for the pediatric population using MDI, there is limited pediatric specific literature. This article will describe current options, data-driven diabetes management, benefits, challenges and clinical use of SIP technology in the pediatric population.


2021 ◽  
pp. medethics-2020-106905
Author(s):  
Soogeun Samuel Lee

The UK Government’s Code of Conduct for data-driven health and care technologies, specifically artificial intelligence (AI)-driven technologies, comprises 10 principles that outline a gold-standard of ethical conduct for AI developers and implementers within the National Health Service. Considering the importance of trust in medicine, in this essay I aim to evaluate the conceptualisation of trust within this piece of ethical governance. I examine the Code of Conduct, specifically Principle 7, and extract two positions: a principle of rationally justified trust that posits trust should be made on sound epistemological bases and a principle of value-based trust that views trust in an all-things-considered manner. I argue rationally justified trust is largely infeasible in trusting AI due to AI’s complexity and inexplicability. Contrarily, I show how value-based trust is more feasible as it is intuitively used by individuals. Furthermore, it better complies with Principle 1. I therefore conclude this essay by suggesting the Code of Conduct to hold the principle of value-based trust more explicitly.


2018 ◽  
Author(s):  
Dale Barr ◽  
Roger Philip Levy ◽  
Christoph Scheepers ◽  
Harry Tily

Linear mixed-effects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, many researchers do not seem to appreciate how random effects structures affect the generalizability of an analysis. Here, we argue that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that LMEMs generalize best when they include the maximal random effects structure justified by the design. The generalization performance of LMEMs including data-driven random effects structures strongly depends upon modeling criteria and sample size, yielding reasonable results on moderately-sized samples when conservative criteria are used, but with little or no power advantage over maximal models. Finally, random-intercepts-only LMEMs used on within-subjects and/or within-items data from populations where subjects and/or items vary in their sensitivity to experimental manipulations always generalize worse than separate F1 and F2 tests, and in many cases, even worse than F1 alone. Maximal LMEMs should be the ‘gold standard’ for confirmatory hypothesis testing in psycholinguistics and beyond.


2021 ◽  
Vol 20 ◽  
pp. 117693512110562
Author(s):  
Robert J O’Shea ◽  
Sophia Tsoka ◽  
Gary JR Cook ◽  
Vicky Goh

Background: Evaluation of gene interaction models in cancer genomics is challenging, as the true distribution is uncertain. Previous analyses have benchmarked models using synthetic data or databases of experimentally verified interactions – approaches which are susceptible to misrepresentation and incompleteness, respectively. The objectives of this analysis are to (1) provide a real-world data-driven approach for comparing performance of genomic model inference algorithms, (2) compare the performance of LASSO, elastic net, best-subset selection, [Formula: see text] penalisation and [Formula: see text] penalisation in real genomic data and (3) compare algorithmic preselection according to performance in our benchmark datasets to algorithmic selection by internal cross-validation. Methods: Five large [Formula: see text] genomic datasets were extracted from Gene Expression Omnibus. ‘Gold-standard’ regression models were trained on subspaces of these datasets ([Formula: see text], [Formula: see text]). Penalised regression models were trained on small samples from these subspaces ([Formula: see text]) and validated against the gold-standard models. Variable selection performance and out-of-sample prediction were assessed. Penalty ‘preselection’ according to test performance in the other 4 datasets was compared to selection internal cross-validation error minimisation. Results: [Formula: see text]-penalisation achieved the highest cosine similarity between estimated coefficients and those of gold-standard models. [Formula: see text]-penalised models explained the greatest proportion of variance in test responses, though performance was unreliable in low signal:noise conditions. [Formula: see text] also attained the highest overall median variable selection F1 score. Penalty preselection significantly outperformed selection by internal cross-validation in each of 3 examined metrics. Conclusions: This analysis explores a novel approach for comparisons of model selection approaches in real genomic data from 5 cancers. Our benchmarking datasets have been made publicly available for use in future research. Our findings support the use of [Formula: see text] penalisation for structural selection and [Formula: see text] penalisation for coefficient recovery in genomic data. Evaluation of learning algorithms according to observed test performance in external genomic datasets yields valuable insights into actual test performance, providing a data-driven complement to internal cross-validation in genomic regression tasks.


Journalism ◽  
2017 ◽  
Vol 21 (9) ◽  
pp. 1246-1263 ◽  
Author(s):  
Wiebke Loosen ◽  
Julius Reimer ◽  
Fenja De Silva-Schmidt

Data-driven journalism can be considered as journalism’s response to the datafication of society. To better understand the key components and development of this still young and fast evolving genre, we investigate what the field itself defines as its ‘gold-standard’: projects that were nominated for the Data Journalism Awards from 2013 to 2016 (n = 225). Using a content analysis, we examine, among other aspects, the data sources and types, visualisations, interactive features, topics and producers. Our results demonstrate, for instance, only a few consistent developments over the years and a predominance of political pieces, of projects by newspapers and by investigative journalism organisations, of public data from official institutions as well as a glut of simple visualisations, which in sum echoes a range of general tendencies in data journalism. On the basis of our findings, we evaluate data-driven journalism’s potential for improvement with regard to journalism’s societal functions.


2018 ◽  
Vol 64 (1) ◽  
pp. 39-46 ◽  
Author(s):  
Erik Vindbjerg ◽  
Guido Makransky ◽  
Erik Lykke Mortensen ◽  
Jessica Carlsson

Objective: The Hamilton Depression Rating Scale (HDRS) is considered the gold standard measure of depression. The factor structure of the HDRS is generally unstable, but 4 to 8 items appear to form a general depression factor. As transcultural studies of the HDRS have received little attention, and as most of the studies have taken a data-driven approach with a tendency to yield fragmented results, it is not clear if an HDRS general depression factor can also be found in non-Western populations. This is an important issue in deciding on the appropriateness of the scale as a gold standard in transcultural psychiatry. Method: A systematic review was carried out to compare previously reported factor structures of the HDRS in non-Western cultures. Overlapping clusters across studies were identified and subsequently tested with confirmatory factor analysis (CFA) of responses from an independent sample. Results: Fourteen relevant studies were identified, 12 of which were obtained. A general depression factor was identified, consisting of the following symptoms: depressed mood, guilt, loss of interests, retardation, suicide, and psychological anxiety. The subsequent CFA analysis supported the fit of this model. Conclusions: This study indicates that a general depression cluster is manifest in responses to the HDRS across cultures. While psychometric properties of the full-length HDRS are still debated, the general depression cluster appears pertinent to the assessment of depression across cultures. We recommend that cross-cultural clinicians and researchers focus on the use of unidimensional depression scales, which are in agreement with this cluster.


Author(s):  
Justin S Smith ◽  
Benjamin T. Nebgen ◽  
Roman Zubatyuk ◽  
Nicholas Lubbers ◽  
Christian Devereux ◽  
...  

<div>Computer simulations are foundational to theoretical chemistry. Quantum-mechanical (QM) methods provide the highest accuracy for simulating molecules but have difficulty scaling to large systems. Empirical interatomic potentials (classical force fields) are scalable, but lack transferability to new systems and are hard to systematically improve. Automated, data-driven machine learning is close to achieving the best of both approaches. Here we use transfer learning to retrain a general purpose neural network potential, ANI-1x, on a dataset of gold standard QM calculations (CCSD(T)/CBS level) that is relatively small but designed to optimally span chemical space. The resulting potential, ANI-1ccx, approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. ANI-1ccx is broadly applicable to materials science, biology and chemistry, and billions of times faster than the parent CCSD(T)/CBS calculations.</div>


Sign in / Sign up

Export Citation Format

Share Document