Recommendations for bacterial ribosome profiling experiments based on bioinformatic evaluation of published data

Ribosome profiling (RIBO-Seq) has improved our understanding of bacterial translation, including finding many unannotated genes. However, protocols for RIBO-Seq and corresponding data analysis are not yet standardized. Here, we analyzed 48 RIBO-Seq samples from nine studies of Escherichia coli K12 grown in lysogeny broth medium and particularly focused on the size-selection step. We show that for conventional expression analysis, a size range between 22 and 30 nucleotides is sufficient to obtain protein-coding fragments, which has the advantage of removing many unwanted rRNA and tRNA reads. More specific analyses may require longer reads and a corresponding improvement in rRNA/tRNA depletion. There is no consensus about the appropriate sequencing depth for RIBO-Seq experiments in prokaryotes, and studies vary significantly in total read number. Our analysis suggests that 20 million reads that are not mapping to rRNA/tRNA are required for global detection of translated annotated genes. We also highlight the influence of drug-induced ribosome stalling, which causes bias at translation start sites. The resulting accumulation of reads at the start site may be especially useful for detecting weakly expressed genes. As different methods suit different questions, it may not be possible to produce a “one-size-fits-all” ribosome profiling data set. Therefore, experiments should be carefully designed in light of the scientific questions of interest. We propose some basic characteristics that should be reported with any new RIBO-Seq data sets. Careful attention to the factors discussed should improve prokaryotic gene detection and the comparability of ribosome profiling data sets.

Download Full-text

Improving Bacterial Ribosome Profiling Data Quality

10.1101/863266 ◽

2019 ◽

Author(s):

Alina Glaub ◽

Christopher Huptas ◽

Klaus Neuhaus ◽

Zachary Ardern

Keyword(s):

Gene Annotation ◽

Ribosome Profiling ◽

Gene Detection ◽

Drug Induced ◽

Protein Coding ◽

E Coli ◽

Ribosome Stalling ◽

Bacterial Ribosome ◽

Translation Start ◽

Selection Step

AbstractRibosome profiling (RIBO-seq) in prokaryotes has the potential to facilitate accurate detection of translation initiation sites, to increase understanding of translational dynamics, and has already allowed detection of many unannotated genes. However, protocols for ribosome profiling and corresponding data analysis are not yet standardized. To better understand the influencing factors, we analysed 48 ribosome profiling samples from 9 studies on E. coli K12 grown in LB medium. We particularly investigated the size selection step in each experiment since the selection for ribosome-protected footprints (RPFs) has been performed at various read lengths. We suggest choosing a size range between 22-30 nucleotides in order to obtain protein-coding fragments. In order to use RIBO-seq data for improving gene annotation of weakly expressed genes, the total amount of reads mapping to protein-coding sequences and not rRNA or tRNA is important, but no consensus about the appropriate sequencing depth has been reached. Again, this causes significant variation between studies. Our analysis suggests that 20 million non rRNA/tRNA mapping reads are required for global detection of translated annotated genes. Further, we highlight the influence of drug induced ribosome stalling, causing bias at translation start sites. Drug induced stalling may be especially useful for detecting weakly expressed genes. These suggestions should improve both gene detection and the comparability of resulting ribosome profiling datasets.

Download Full-text

Relative testis size and mating systems in anurans: large testis in multiple-male mating in foam-nesting frogs

Animal Biology ◽

10.1163/157075511x570312 ◽

2011 ◽

Vol 61 (2) ◽

pp. 225-238 ◽

Cited By ~ 15

Author(s):

Wen Bo Liao ◽

Zhi Ping Mi ◽

Cai Quan Zhou ◽

Ling Jin ◽

Xian Han ◽

...

Keyword(s):

Sperm Competition ◽

Published Data ◽

Male Mating ◽

Data Sets ◽

Testis Size ◽

Data Set ◽

Monogamous Species ◽

Large Testis ◽

Testes Size ◽

Testis Mass

AbstractComparative studies of the relative testes size in animals show that promiscuous species have relatively larger testes than monogamous species. Sperm competition favours the evolution of larger ejaculates in many animals – they give bigger testes. In the view, we presented data on relative testis mass for 17 Chinese species including 3 polyandrous species. We analyzed relative testis mass within the Chinese data set and combining those data with published data sets on Japanese and African frogs. We found that polyandrous foam nesting species have relatively large testes, suggesting that sperm competition was an important factor affecting the evolution of relative testes size. For 4 polyandrous species testes mass is positively correlated with intensity (males/mating) but not with risk (frequency of polyandrous matings) of sperm competition.

Download Full-text

Children with 5′-end NF1 gene mutations are more likely to have glioma

Neurology Genetics ◽

10.1212/nxg.0000000000000192 ◽

2017 ◽

Vol 3 (5) ◽

pp. e192 ◽

Cited By ~ 12

Author(s):

Corina Anastasaki ◽

Stephanie M. Morris ◽

Feng Gao ◽

David H. Gutmann

Keyword(s):

Gene Mutation ◽

Statistical Significance ◽

Gene Mutations ◽

Neurofibromatosis Type ◽

Published Data ◽

Data Sets ◽

Nonsense Mutations ◽

Data Set ◽

Nf1 Gene ◽

The Relationship

Objective:To ascertain the relationship between the germline NF1 gene mutation and glioma development in patients with neurofibromatosis type 1 (NF1).Methods:The relationship between the type and location of the germline NF1 mutation and the presence of a glioma was analyzed in 37 participants with NF1 from one institution (Washington University School of Medicine [WUSM]) with a clinical diagnosis of NF1. Odds ratios (ORs) were calculated using both unadjusted and weighted analyses of this data set in combination with 4 previously published data sets.Results:While no statistical significance was observed between the location and type of the NF1 mutation and glioma in the WUSM cohort, power calculations revealed that a sample size of 307 participants would be required to determine the predictive value of the position or type of the NF1 gene mutation. Combining our data set with 4 previously published data sets (n = 310), children with glioma were found to be more likely to harbor 5′-end gene mutations (OR = 2; p = 0.006). Moreover, while not clinically predictive due to insufficient sensitivity and specificity, this association with glioma was stronger for participants with 5′-end truncating (OR = 2.32; p = 0.005) or 5′-end nonsense (OR = 3.93; p = 0.005) mutations relative to those without glioma.Conclusions:Individuals with NF1 and glioma are more likely to harbor nonsense mutations in the 5′ end of the NF1 gene, suggesting that the NF1 mutation may be one predictive factor for glioma in this at-risk population.

Download Full-text

Nature of intrafractional and interfractional prostate motion during stereotactic radiation.

Journal of Clinical Oncology ◽

10.1200/jco.2016.34.2_suppl.152 ◽

2016 ◽

Vol 34 (2_suppl) ◽

pp. 152-152

Author(s):

Karthikeyan Perumal ◽

Mahadev Potharaju

Keyword(s):

Treatment Time ◽

Published Data ◽

Data Sets ◽

Data Set ◽

Stereotactic Radiation ◽

X Ray ◽

Total Treatment Time ◽

Prostate Motion ◽

Predictable Pattern ◽

Total Treatment

152 Background: To characterize the intra-fraction and inter-fraction prostate motion as tracked by the X-ray images of the implanted gold fiducials during stereotactic radiotherapy with CyberKnife. The published data have analysed the linear and angular prostate motion intrafraction and interfraction prostate motion among patients. We sought to quantify the same within each patient. Methods: Twenty Five patients with localized prostate cancer treated with CyberKnife radiosurgery between January 2013 and August 2015 were studied retrospectively. A data set constitutes the deviations derived from X-ray images obtained between two consecutive couch motions. Results: Included in the analysis were 3926 data sets. A total of 210 non-coplanar fields were used per fraction. The mean total treatment time for all fields per fraction was 36.13 minutes. The detected and corrected movements over all were in a range of ± 10.1 mm in linear direction (Right: mean 1.1±0.4 mm; Left: mean 1.0±0.6 mm; Superior: mean 0.7±0.3 mm; Inferior: mean 1.6±0.6 mm; Anterior: mean 1.6±0.7 mm; Posterior: mean 0.5±0.3 mm with maximum (max) movement range of Right max 9.9±6.4 mm, Left max 7.1±3.4 mm, Superior max 8.6±5.4 mm, Inferior max 10.1±8.5 mm, Anterior max 9.2±6.5 mm, Posterior max 8.4±2.9 mm) and angular movements were in a range of ± 6.7 deg in all directions (Right Angle: mean 0.6±0.3 deg; Left Angle: mean 0.6±0.3 deg; Head Up(H-U): mean 1.3±0.6 deg; Head Down(H-D): mean 1.4±0.6 deg; Counter-Clockwise movement (CCW): mean 0.7±0.3 deg; Clockwise movement (CW): mean 0.5±0.3 deg with max rotation range of Right angle max 2.4±2 deg, Left angle max 2.7±2 deg, H-U max 10.2±3.5 deg, H-D max 6.7±4.8 deg, CCW 4±2.9 deg, CW max 2.8±2.4 deg). There was an unpredictable change in prostate motion inter-fraction in each patient. But, a unique observation is that a predictable pattern exists for prostate motion intra-fraction within a patient. Change in the linear or angular prostate motion intra-fraction in any direction is not erratic. Conclusions: The linear and rotational prostate motion intra-fraction in any direction has a predictable pattern and any change is gradual and not erratic. The motion shows secular trend during the course of treatment.

Download Full-text

The use of genetic programming to develop a predictor of swash excursion on sandy beaches

Natural Hazards and Earth System Science ◽

10.5194/nhess-18-599-2018 ◽

2018 ◽

Vol 18 (2) ◽

pp. 599-611 ◽

Cited By ~ 14

Author(s):

Marinella Passarella ◽

Evan B. Goldstein ◽

Sandro De Muro ◽

Giovanni Coco

Keyword(s):

Genetic Programming ◽

Sandy Beaches ◽

Coastal Hazards ◽

Coastal Processes ◽

Published Data ◽

Data Sets ◽

Prediction Errors ◽

Data Set ◽

Physical Insight ◽

Insight Into

Abstract. We use genetic programming (GP), a type of machine learning (ML) approach, to predict the total and infragravity swash excursion using previously published data sets that have been used extensively in swash prediction studies. Three previously published works with a range of new conditions are added to this data set to extend the range of measured swash conditions. Using this newly compiled data set we demonstrate that a ML approach can reduce the prediction errors compared to well-established parameterizations and therefore it may improve coastal hazards assessment (e.g. coastal inundation). Predictors obtained using GP can also be physically sound and replicate the functionality and dependencies of previous published formulas. Overall, we show that ML techniques are capable of both improving predictability (compared to classical regression approaches) and providing physical insight into coastal processes.

Download Full-text

Analysis of copy number variations at 15 schizophrenia-associated loci

The British Journal of Psychiatry ◽

10.1192/bjp.bp.113.131052 ◽

2014 ◽

Vol 204 (2) ◽

pp. 108-114 ◽

Cited By ~ 234

Author(s):

Elliott Rees ◽

James T. R. Walters ◽

Lyudmila Georgieva ◽

Anthony R. Isles ◽

Kimberly D. Chambert ◽

...

Keyword(s):

Copy Number ◽

Copy Number Variants ◽

Copy Number Variations ◽

High Rate ◽

Published Data ◽

Data Sets ◽

Deleterious Mutations ◽

Data Set ◽

Data Analyses ◽

Susceptibility Factors

BackgroundA number of copy number variants (CNVs) have been suggested as susceptibility factors for schizophrenia. For some of these the data remain equivocal, and the frequency in individuals with schizophrenia is uncertain.AimsTo determine the contribution of CNVs at 15 schizophrenia-associated loci (a) using a large new data-set of patients with schizophrenia (n= 6882) and controls (n= 6316), and (b) combining our results with those from previous studies.MethodWe used Illumina microarrays to analyse our data. Analyses were restricted to 520 766 probes common to all arrays used in the different data-sets.ResultsWe found higher rates in participants with schizophrenia than in controls for 13 of the 15 previously implicated CNVs. Six were nominally significantly associated (P<0.05) in this new data-set: deletions at 1q21.1,NRXN1, 15q11.2 and 22q11.2 and duplications at 16p11.2 and the Angelman/Prader–Willi Syndrome (AS/PWS) region. All eight AS/PWS duplications in patients were of maternal origin. When combined with published data, 11 of the 15 loci showed highly significant evidence for association with schizophrenia (P<4.1×10−4).ConclusionsWe strengthen the support for the majority of the previously implicated CNVs in schizophrenia. About 2.5% of patients with schizophrenia and 0.9% of controls carry a large, detectable CNV at one of these loci. Routine CNV screening may be clinically appropriate given the high rate of known deleterious mutations in the disorder and the comorbidity associated with these heritable mutations.

Download Full-text

Pre-clustering data sets using cluster4x improves the signal-to-noise ratio of high-throughput crystallography drug-screening analysis

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798320012619 ◽

2020 ◽

Vol 76 (11) ◽

pp. 1134-1144 ◽

Cited By ~ 2

Author(s):

Helen M. Ginn

Keyword(s):

Drug Targets ◽

Signal To Noise Ratio ◽

Published Data ◽

Data Sets ◽

Data Set ◽

X Ray Crystallography ◽

Fragment Screening ◽

High Profile ◽

Clustering Data ◽

Interactive Graphical User Interface

Drug and fragment screening at X-ray crystallography beamlines has been a huge success. However, it is inevitable that more high-profile biological drug targets will be identified for which high-quality, highly homogenous crystal systems cannot be found. With increasing heterogeneity in crystal systems, the application of current multi-data-set methods becomes ever less sensitive to bound ligands. In order to ease the bottleneck of finding a well behaved crystal system, pre-clustering of data sets can be carried out using cluster4x after data collection to separate data sets into smaller partitions in order to restore the sensitivity of multi-data-set methods. Here, the software cluster4x is introduced for this purpose and validated against published data sets using PanDDA, showing an improved total signal from existing ligands and identifying new hits in both highly heterogenous and less heterogenous multi-data sets. cluster4x provides the researcher with an interactive graphical user interface with which to explore multi-data set experiments.

Download Full-text

Phylogenetic position of Pelusios williamsi and a critique of current GenBank procedures (Reptilia: Testudines: Pelomedusidae)

Amphibia-Reptilia ◽

10.1163/156853812x627204 ◽

2012 ◽

Vol 33 (1) ◽

pp. 150-154 ◽

Cited By ~ 4

Author(s):

Uwe Fritz ◽

Mario Vargas-Ramírez ◽

Pavel Široký

Keyword(s):

Natural History ◽

Genetic Divergence ◽

Best Practice ◽

Nuclear Data ◽

Phylogenetic Position ◽

Published Data ◽

Data Sets ◽

Natural History Museums ◽

Data Set ◽

History Museums

We re-examine the phylogenetic position of Pelusios williamsi by merging new sequences with an earlier published data set of all Pelusios species, except the possibly extinct P. seychellensis, and the nine previously identified lineages of the closely allied genus Pelomedusa (2054 bp mtDNA, 2025 bp nDNA). Furthermore, we include new sequences of Pelusios broadleyi, P. castanoides, P. gabonensis and P. marani. Individual and combined analyses of the mitochondrial and nuclear data sets indicate that P. williamsi is sister to P. castanoides, as predicted by morphology. This provides evidence for the misidentification of GenBank sequences allegedly representing P. williamsi. Such mislabelled GenBank sequences contribute to continued confusion, because only the original submitter can revise their identification; an impractical procedure impeding the rectification of obvious mistakes. We recommend implementing another option for revising taxonomic identifications, paralleling the century-old best practice of natural history museums for new determinations of specimens. Within P. broadleyi, P. gabonensis and P. marani, there is only shallow genetic divergence, while some phylogeographic structuring is present in the wide-ranging species P. castaneus and P. castanoides.

Download Full-text

Mitochondrial Evidence on the Phylogenetic Position of Caecilians (Amphibia: Gymnophiona)

Genetics ◽

10.1093/genetics/155.2.765 ◽

2000 ◽

Vol 155 (2) ◽

pp. 765-775

Author(s):

Rafael Zardoya ◽

Axel Meyer

Keyword(s):

Tandem Repeats ◽

Sequence Data ◽

Phylogenetic Analyses ◽

Maximum Parsimony Analysis ◽

Bootstrap Support ◽

Phylogenetic Position ◽

Trna Genes ◽

Data Sets ◽

Data Set ◽

Protein Coding

Abstract The complete nucleotide sequence (17,005 bp) of the mitochondrial genome of the caecilian Typhlonectes natans (Gymnophiona, Amphibia) was determined. This molecule is characterized by two distinctive genomic features: there are seven large 109-bp tandem repeats in the control region, and the sequence for the putative origin of replication of the L strand can potentially fold into two alternative secondary structures (one including part of the tRNACys). The new sequence data were used to assess the phylogenetic position of caecilians and to gain insights into the origin of living amphibians (frogs, salamanders, and caecilians). Phylogenetic analyses of two data sets—one combining protein-coding genes and the other combining tRNA genes—strongly supported a caecilian + frog clade and, hence, monophyly of modern amphibians. These two data sets could not further resolve relationships among the coelacanth, lungfishes, and tetrapods, but strongly supported diapsid affinities of turtles. Phylogenetic relationships among a larger set of species of frogs, salamanders, and caecilians were estimated with a mitochondrial rRNA data set. Maximum parsimony analysis of this latter data set also recovered monophyly of living amphibians and favored a frog + salamander (Batrachia) relationship. However, bootstrap support was only moderate at these nodes. This is likely due to an extensive among-site rate heterogeneity in the rRNA data set and the narrow window of time in which the three main groups of living amphibians were originated.

Download Full-text

A novel, unbiased approach to evaluating subsequent search misses in dual target visual search

Attention Perception & Psychophysics ◽

10.3758/s13414-020-02085-0 ◽

2020 ◽

Vol 82 (7) ◽

pp. 3357-3373

Author(s):

Mark W. Becker ◽

Kaitlyn Anderson ◽

Jan W. Brascamp

Keyword(s):

Visual Search ◽

Effect Size ◽

Traditional Method ◽

Published Data ◽

Data Sets ◽

Data Set ◽

Novel Method ◽

Subsequent Search Misses ◽

Dual Target ◽

Rule Out

Abstract Research in radiology and visual cognition suggest that finding one target during visual search may result in increased misses for a second target, an effect known as subsequent search misses (SSM). Here, we demonstrate that the common method of calculating second-target detection performance is biased and could produce spurious SSM effects. We describe the source of that bias and document factors that influence its magnitude. We use a modification of signal-detection theory to develop a novel, unbiased method of calculating the expected value for dual-target performance under the null hypothesis. We then apply our novel method to two of our data sets that showed modest SSM effects when calculated in the traditional manner. Our correction reduced the effect size to the point that there was no longer a significant SSM effect. We then applied our method to a published data set that had a larger effect size when calculated using the traditional calculation as well as when using an alternative calculation that was recently proposed to account for biases in the traditional method. We find that both the traditional method and the recently proposed alternative substantially overestimate the magnitude of the SSM effect in these data, but a significant SSM effect persisted even with our calculation. We recommend that future SSM studies use our method to ensure accurate effect-size estimates, and suggest that the method be applied to reanalyze published results, particularly those with small effect sizes, to rule out the possibility that they were spurious.

Download Full-text