Long-wavelength Mesh&Collect native SAD phasing from microcrystals

Harnessing the anomalous signal from macromolecular crystals with volumes of less than 10 000 µm3 for native phasing requires careful experimental planning. The type of anomalous scatterers that are naturally present in the sample, such as sulfur, phosphorus and calcium, will dictate the beam energy required and determine the level of radiation sensitivity, while the crystal size will dictate the beam size and the sample-mounting technique, in turn indicating the specifications of a suitable beamline. On the EMBL beamline P13 at PETRA III, Mesh&Collect data collection from concanavalin A microcrystals with linear dimensions of ∼20 µm or less using an accordingly sized microbeam at a wavelength of 1.892 Å (6.551 keV, close to the Mn edge at 6.549 keV) increases the expected Bijvoet ratio to 2.1% from an expected 0.7% at 12.6 keV (Se K edge), thus allowing experimental phase determination using the anomalous signal from naturally present Mn2+ and Ca2+ ions. Dozens of crystals were harvested and flash-cryocooled in micro-meshes, rapidly screened for diffraction (less than a minute per loop) and then used for serial Mesh&Collect collection of about 298 partial data sets (10° of crystal rotation per sample). The partial data sets were integrated and scaled. A genetic algorithm for combining partial data sets was used to select those to be merged into a single data set. This final data set showed high completeness, high multiplicity and sufficient anomalous signal to locate the anomalous scatterers, and provided phasing information which allowed complete auto-tracing of the polypeptide chain. To allow the complete experiment to run in less than 2 h, a practically acceptable time frame, the diffractometer and detector had to run together with limited manual intervention. The combination of several cutting-edge components allowed accurate anomalous signal to be measured from small crystals.

Download Full-text

Making a difference in multi-data-set crystallography: simple and deterministic data-scaling/selection methods

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798320006348 ◽

2020 ◽

Vol 76 (7) ◽

pp. 636-652 ◽

Cited By ~ 1

Author(s):

Greta M. Assmann ◽

Meitian Wang ◽

Kay Diederichs

Keyword(s):

Simulated Data ◽

Test Case ◽

Data Sets ◽

Selection Methods ◽

Data Set ◽

Partial Data ◽

Structure Solution ◽

Making A Difference ◽

Data Scaling ◽

Anomalous Signal

Phasing by single-wavelength anomalous diffraction (SAD) from multiple crystallographic data sets can be particularly demanding because of the weak anomalous signal and possible non-isomorphism. The identification and exclusion of non-isomorphous data sets by suitable indicators is therefore indispensable. Here, simple and robust data-selection methods are described. A multi-dimensional scaling procedure is first used to identify data sets with large non-isomorphism relative to clusters of other data sets. Within each cluster that it identifies, further selection is based on the weighted ΔCC1/2, a quantity representing the influence of a set of reflections on the overall CC1/2 of the merged data. The anomalous signal is further improved by optimizing the scaling protocol. The success of iterating the selection and scaling steps was verified by substructure determination and subsequent structure solution. Three serial synchrotron crystallography (SSX) SAD test cases with hundreds of partial data sets and one test case with 62 complete data sets were analyzed. Structure solution was dramatically simplified with this procedure, and enabled solution of the structures after a few selection/scaling iterations. To explore the limits, the procedure was tested with much fewer data than originally required and could still solve the structure in several cases. In addition, an SSX data challenge, minimizing the number of (simulated) data sets necessary to solve the structure, was significantly underbid.

Download Full-text

Disparities Across Time: Exploring Absenteeism Patterns between Cohorts of Students with Disabilities

Teachers College Record ◽

10.1177/016146812012201114 ◽

2020 ◽

Vol 122 (11) ◽

pp. 1-32

Author(s):

Michael A. Gottfried ◽

Vi-Nhuan Le ◽

J. Jacob Kirksey

Keyword(s):

Students With Disabilities ◽

Social Needs ◽

Data Sets ◽

Chronic Absenteeism ◽

Data Set ◽

Full Day Kindergarten ◽

Effective Interventions ◽

Nationally Representative ◽

Single Data ◽

Over Time

Background It is of grave concern that kindergartners are missing more school than students in any other year of elementary school; therefore, documenting which students are absent and for how long is of upmost importance. Yet, doing so for students with disabilities (SWDs) has received little attention. This study addresses this gap by examining two cohorts of SWDs, separated by more than a decade, to document changes in attendance patterns. Research Questions First, for SWDs, has the number of school days missed or chronic absenteeism rates changed over time? Second, how are changes in the number of school days missed and chronic absenteeism rates related to changes in academic emphasis, presence of teacher aides, SWD-specific teacher training, and preschool participation? Subjects This study uses data from the Early Childhood Longitudinal Study (ECLS), a nationally representative data set of children in kindergarten. We rely on both ECLS data sets— the kindergarten classes of 1998–1999 and 2010–2011. Measures were identical in both data sets, making it feasible to compare children across the two cohorts. Given identical measures, we combined the data sets into a single data set with an indicator for being in the older cohort. Research Design This study examined two sets of outcomes: The first was number of days absent, and the second was likelihood of being chronically absent. These outcomes were regressed on a measure for being in the older cohort (our key measure for changes over time) and numerous control variables. The error term was clustered by classroom. Findings We found that SWDs are absent more often now than they were a decade earlier, and this growth in absenteeism was larger than what students without disabilities experienced. Absenteeism among SWDs was higher for those enrolled in full-day kindergarten, although having attended center-based care mitigates this disparity over time. Implications are discussed. Conclusions Our study calls for additional attention and supports to combat the increasing rates of absenteeism for SWDs over time. Understanding contextual shifts and trends in rates of absenteeism for SWDs in kindergarten is pertinent to crafting effective interventions and research geared toward supporting the academic and social needs of these students.

Download Full-text

Combining cross-crystal averaging and MRSAD to phase a 4354-amino-acid structure

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798315023566 ◽

2016 ◽

Vol 72 (2) ◽

pp. 182-191

Author(s):

Jason Nicholas Busby ◽

J. Shaun Lott ◽

Santosh Panjikar

Keyword(s):

Radiation Damage ◽

Model Building ◽

Data Sets ◽

Low Resolution ◽

Data Set ◽

C Protein ◽

Large Size ◽

Anomalous Signal ◽

Difference Fourier ◽

Combined Data

The B and C proteins from the ABC toxin complex ofYersinia entomophagaform a large heterodimer that cleaves and encapsulates the C-terminal toxin domain of the C protein. Determining the structure of the complex formed by B and the N-terminal region of C was challenging owing to its large size, the non-isomorphism of different crystals and their sensitivity to radiation damage. A native data set was collected to 2.5 Å resolution and a non-isomorphous Ta6Br12-derivative data set was collected that showed strong anomalous signal at low resolution. The tantalum-cluster sites could be found, but the anomalous signal did not extend to a high enough resolution to allow model building. Selenomethionine (SeMet)-derivatized protein crystals were produced, but the high number (60) of SeMet sites and the sensitivity of the crystals to radiation damage made phasing using the SAD or MAD methods difficult. Multiple SeMet data sets were combined to provide 30-fold multiplicity, and the low-resolution phase information from the Ta6Br12data set was transferred to this combined data set by cross-crystal averaging. This allowed the Se atoms to be located in an anomalous difference Fourier map; they were then used inAuto-Rickshawfor multiple rounds of autobuilding and MRSAD.

Download Full-text

Probabilistic Harmonization and Annotation of Single-cell Transcriptomics Data with Deep Generative Models

10.1101/532895 ◽

2019 ◽

Cited By ~ 14

Author(s):

Chenling Xu ◽

Romain Lopez ◽

Edouard Mehlman ◽

Jeffrey Regier ◽

Michael I. Jordan ◽

...

Keyword(s):

Single Cell ◽

Probabilistic Approach ◽

Cell Types ◽

Generative Models ◽

Marker Genes ◽

Data Sets ◽

Data Set ◽

Cell State ◽

Transcriptomics Data ◽

Single Data

AbstractAs single-cell transcriptomics becomes a mainstream technology, the natural next step is to integrate the accumulating data in order to achieve a common ontology of cell types and states. However, owing to various nuisance factors of variation, it is not straightforward how to compare gene expression levels across data sets and how to automatically assign cell type labels in a new data set based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of cohorts of single-cell RNA-seq data sets, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage any available cell state annotations — for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. We demonstrate that scVI and scANVI compare favorably to the existing methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings such as a hierarchical structure of cell state labels. We further show that different from existing methods, scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as our case study. scVI and scANVI are available as open source software and can be readily used to facilitate cell state annotation and help ensure consistency and reproducibility across studies.

Download Full-text

High-throughput in situ experimental phasing

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798320009109 ◽

2020 ◽

Vol 76 (8) ◽

pp. 790-801 ◽

Cited By ~ 1

Author(s):

Joshua M. Lawrence ◽

Julien Orlans ◽

Gwyndaf Evans ◽

Allen M. Orville ◽

James Foadi ◽

...

Keyword(s):

Heavy Atom ◽

Data Sets ◽

Human Intervention ◽

Macromolecular Crystallography ◽

New Approach ◽

Partial Data ◽

Experimental Phasing ◽

Anomalous Signal ◽

First Time

In this article, a new approach to experimental phasing for macromolecular crystallography (MX) at synchrotrons is introduced and described for the first time. It makes use of automated robotics applied to a multi-crystal framework in which human intervention is reduced to a minimum. Hundreds of samples are automatically soaked in heavy-atom solutions, using a Labcyte Inc. Echo 550 Liquid Handler, in a highly controlled and optimized fashion in order to generate derivatized and isomorphous crystals. Partial data sets obtained on MX beamlines using an in situ setup for data collection are processed with the aim of producing good-quality anomalous signal leading to successful experimental phasing.

Download Full-text

MeshAndCollect: an automated multi-crystal data-collection workflow for synchrotron macromolecular crystallography beamlines

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004715017927 ◽

2015 ◽

Vol 71 (11) ◽

pp. 2328-2343 ◽

Cited By ~ 68

Author(s):

Ulrich Zander ◽

Gleb Bourenkov ◽

Alexander N. Popov ◽

Daniele de Sanctis ◽

Olof Svensson ◽

...

Keyword(s):

Hierarchical Cluster ◽

Data Sets ◽

Macromolecular Crystallography ◽

X Ray Diffraction ◽

Data Set ◽

X Ray ◽

Partial Data ◽

Final Data ◽

Dispersion Technique

Here, an automated procedure is described to identify the positions of many cryocooled crystals mounted on the same sample holder, to rapidly predict and rank their relative diffraction strengths and to collect partial X-ray diffraction data sets from as many of the crystals as desired. Subsequent hierarchical cluster analysis then allows the best combination of partial data sets, optimizing the quality of the final data set obtained. The results of applying the method developed to various systems and scenarios including the compilation of a complete data set from tiny crystals of the membrane protein bacteriorhodopsin and the collection of data sets for successful structure determination using the single-wavelength anomalous dispersion technique are also presented.

Download Full-text

Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis

Hydrology and Earth System Sciences ◽

10.5194/hess-17-3707-2013 ◽

2013 ◽

Vol 17 (10) ◽

pp. 3707-3720 ◽

Cited By ~ 209

Author(s):

B. Mueller ◽

M. Hirschi ◽

C. Jimenez ◽

P. Ciais ◽

P. A. Dirmeyer ◽

...

Keyword(s):

Land Surface ◽

Data Sets ◽

Individual Data ◽

Data Set ◽

Annual Variations ◽

Input Variables ◽

Global Increase ◽

Single Data ◽

The Individual

Abstract. Land evapotranspiration (ET) estimates are available from several global data sets. Here, monthly global land ET synthesis products, merged from these individual data sets over the time periods 1989–1995 (7 yr) and 1989–2005 (17 yr), are presented. The merged synthesis products over the shorter period are based on a total of 40 distinct data sets while those over the longer period are based on a total of 14 data sets. In the individual data sets, ET is derived from satellite and/or in situ observations (diagnostic data sets) or calculated via land-surface models (LSMs) driven with observations-based forcing or output from atmospheric reanalyses. Statistics for four merged synthesis products are provided, one including all data sets and three including only data sets from one category each (diagnostic, LSMs, and reanalyses). The multi-annual variations of ET in the merged synthesis products display realistic responses. They are also consistent with previous findings of a global increase in ET between 1989 and 1997 (0.13 mm yr−2 in our merged product) followed by a significant decrease in this trend (−0.18 mm yr−2), although these trends are relatively small compared to the uncertainty of absolute ET values. The global mean ET from the merged synthesis products (based on all data sets) is 493 mm yr−1 (1.35 mm d−1) for both the 1989–1995 and 1989–2005 products, which is relatively low compared to previously published estimates. We estimate global runoff (precipitation minus ET) to 263 mm yr−1 (34 406 km3 yr−1) for a total land area of 130 922 000 km2. Precipitation, being an important driving factor and input to most simulated ET data sets, presents uncertainties between single data sets as large as those in the ET estimates. In order to reduce uncertainties in current ET products, improving the accuracy of the input variables, especially precipitation, as well as the parameterizations of ET, are crucial.

Download Full-text

Screen, Ration and Churn: Demand Management and the Crisis in Children’s Social Care

The British Journal of Social Work ◽

10.1093/bjsw/bcz035 ◽

2019 ◽

Vol 50 (3) ◽

pp. 868-889 ◽

Cited By ~ 1

Author(s):

Rick Hood ◽

Allie Goldacre ◽

Sarah Gorin ◽

Paul Bywaters

Keyword(s):

Social Care ◽

Demand Management ◽

Area Deprivation ◽

Data Sets ◽

Current Crisis ◽

Local Authorities ◽

Data Set ◽

Care Services ◽

Trends Over Time ◽

Single Data

Abstract This article presents findings from a quantitative study of the national data-sets for statutory children’s social care services in England. The aim of the study was to examine how demand management varied in local authorities with differing levels of area deprivation. About 152 local authorities census returns and other statistical indicators covering the period 2014–2017 were combined into a single data-set. Statistical analysis was undertaken to explore trends over time and correlations between indicators that might indicate patterns in the way demand was managed. Findings showed that high levels of deprivation have continued to be strongly linked to high levels of activity and that local authorities have continued to increase their use of protective interventions relative to referrals. Evidence was found for three interconnected mechanisms, through which local authorities tended to manage demand for services: screening, rationing and workforce churn. The article describes these mechanisms and comments on their significance for the current crisis of demand in the sector.

Download Full-text

Domain-Based Benchmark Experiments: Exploratory and Inferential Analysis

Austrian Journal of Statistics ◽

10.17713/ajs.v41i1.185 ◽

2016 ◽

Vol 41 (1) ◽

Cited By ~ 8

Author(s):

Manuel J. A. Eugster ◽

Torsten Hothorn ◽

Friedrich Leisch

Keyword(s):

Learning Algorithm ◽

Learning Algorithms ◽

Joint Analysis ◽

Data Sets ◽

Complete Collection ◽

Data Set ◽

Enterprise Application ◽

Empirical Performance ◽

Formal Statistical Analysis ◽

Single Data

Benchmark experiments are the method of choice to compare learning algorithms empirically. For collections of data sets, the empirical performance distributions of a set of learning algorithms are estimated, compared, and ordered. Usually this is done for each data set separately. The present manuscript extends this single data set-based approach to a joint analysis for the complete collection, the so called problem domain. This enablesto decide which algorithms to deploy in a specific application or to compare newly developed algorithms with well-known algorithms on established problem domains.Specialized visualization methods allow for easy exploration of huge amounts of benchmark data. Furthermore, we take the benchmark experiment design into account and use mixed-effects models to provide a formal statistical analysis. Two domain-based benchmark experiments demonstrate our methods: the UCI domain as a well-known domain when one is developing a new algorithm; and the Grasshopper domain as a domain where we want to find the best learning algorithm for a prediction component in an enterprise application software system.

Download Full-text

Patterns in research and data sharing for the study of form and function in caviomorph rodents

Journal of Mammalogy ◽

10.1093/jmammal/gyaa002 ◽

2020 ◽

Vol 101 (2) ◽

pp. 604-612

Author(s):

Luis D Verde Arregoitia ◽

Pablo Teta ◽

Guillermo D’Elía

Keyword(s):

Data Sharing ◽

Open Data ◽

Data Sets ◽

Ecological Data ◽

Data Set ◽

Form And Function ◽

Information Collections ◽

Phylogenetic Hypotheses ◽

Single Data ◽

And Function

Abstract The combination of morphometrics, phylogenetic comparative methods, and open data sets has renewed interest in relating morphology to adaptation and ecological opportunities. Focusing on the Caviomorpha, a well-studied mammalian group, we evaluated patterns in research and data sharing in studies relating form and function. Caviomorpha encompasses a radiation of rodents that is diverse both taxonomically and ecologically. We reviewed 41 publications investigating ecomorphology in this group. We recorded the type of data used in each study and whether these data were made available, and we re-digitized all provided data. We tracked two major lines of information: collections material examined and trait data for morphological and ecological traits. Collectively, the studies considered 63% of extant caviomorph species; all extant families and genera were represented. We found that species-level trait data rarely were provided. Specimen-level data were even less common. Morphological and ecological data were too heterogeneous and sparse to aggregate into a single data set, so we created relational tables with the data. Additionally, we concatenated all specimen lists into a single data set and standardized all relevant data for phylogenetic hypotheses and gene sequence accessions to facilitate future morphometric and phylogenetic comparative research. This work highlights the importance and ongoing use of scientific collections, and it allows for the integration of specimen information with species trait data. Recientemente ha resurgido el interés por estudiar la relación entre morfología, ecología, y adaptación. Esto se debe al desarrollo de nuevas herramientas morfométricas y filogenéticas, y al acceso a grandes bases de datos para estudios comparados. Revisamos 41 publicaciones sobre ecomorfología de roedores caviomorfos, un grupo diverso y bien estudiado, para evaluar los patrones de investigación y la transparencia para la liberación de datos. Registramos los tipos de datos que se utilizaron para cada estudio y si los datos están disponibles. Cuando estos datos se compartieron, los redigitalizamos. Nos enfocamos en los ejemplares consultados, y en datos que describen rasgos ecológicos y morfológicos para las especies estudiadas. Los estudios que revisamos abarcan el 63% de las especies de caviomorfos que actualmente existen. Encontramos que raramente fueron compartidos los datos que se tomaron para especies, y menos aún para ejemplares. Los datos morfológicos y ecológicos eran demasiado heterogéneos e exiguos para consolidar en un solo banco de datos; debido a esta circunstancia, creamos tablas relacionales con los datos. Además, enlazamos todas las listas individuales de especímenes para crear un solo banco de datos y estandarizamos todos los datos pertinentes a hipótesis filogenéticas, así como los números de acceso de secuencias genéticas, para así facilitar eventuales estudios comparados de morfometría y filogenia. Este trabajo resalta la importancia de las colecciones científicas y documenta su uso, además permitiendo la futura integración de datos derivados de ejemplares con datos sobre rasgos ecomorfológicos a nivel de especie.

Download Full-text