Uneven missing data skews phylogenomic relationships within the lories and lorikeets

AbstractThe resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded and fragmented, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern sample types impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage sites had several clades where historical or modern samples clustered together, which were not observed in trees with more stringent filtering. To assess if the aberrant relationships were affected by missing data, we performed a targeted outlier analysis of sites and loci and a more general data reduction approach where we excluded sites based on a percentage of data completeness. The outlier analyses showed that 6.6% of total sites were driving the topological differences among trees built with and without low coverage sites, and at these sites, historical samples had 7.5x more missing data than modern ones. An examination of subclades identified loci biased by missing data, and the exclusion of these loci shifted phylogenetic relationships. Predictive modeling found that outlier analysis scores were not correlated with summary statistics of locus alignments, indicating that outlier loci do not have characteristics differing from other loci. Excluding missing data by percentage completeness indicated that sites with 70% completeness were necessary to avoid spurious relationships, but more stringent conditions of data completeness produced less-resolved trees. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

Download Full-text

Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets

Genome Biology and Evolution ◽

10.1093/gbe/evaa113 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1131-1147

Author(s):

Brian Tilston Smith ◽

William M Mauck ◽

Brett W Benz ◽

Michael J Andersen

Keyword(s):

Missing Data ◽

Museum Specimens ◽

Outlier Analysis ◽

Sequencing Technology ◽

Data Completeness ◽

Data Skew ◽

Outlier Test ◽

The Stability ◽

Low Coverage ◽

Historical Samples

Abstract The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

Download Full-text

A program to analyze the distributions of unmeasured reflections

Journal of Applied Crystallography ◽

10.1107/s0021889811019546 ◽

2011 ◽

Vol 44 (4) ◽

pp. 865-872 ◽

Cited By ~ 6

Author(s):

Ludmila Urzhumtseva ◽

Alexandre Urzhumtsev

Keyword(s):

Spatial Distribution ◽

Missing Data ◽

Computer Program ◽

Diffraction Data ◽

Reciprocal Space ◽

Data Set ◽

Data Completeness ◽

Significant Time ◽

Cumulative Data ◽

Structure Solution

Crystallographic Fourier maps may contain barely interpretable or non-interpretable regions if these maps are calculated with an incomplete set of diffraction data. Even a small percentage of missing data may be crucial if these data are distributed non-uniformly and form connected regions of reciprocal space. Significant time and effort can be lost trying to interpret poor maps, in improving them by phase refinement or in fighting against artefacts, whilst the problem could in fact be solved by completing the data set. To characterize the distribution of missing reflections, several types of diagrams have been suggested in addition to the usual plots of completeness in resolution shells and cumulative data completeness. A computer program,FOBSCOM, has been developed to analyze the spatial distribution of unmeasured diffraction data, to search for connected regions of unmeasured reflections and to obtain numeric characteristics of these regions. By performing this analysis, the program could help to save time during structure solution for a number of projects. It can also provide information about a possible overestimation of the map quality and model-biased features when calculated values are used to replace unmeasured data.

Download Full-text

Reveel: large-scale population genotyping using low-coverage sequencing data

10.1101/011882 ◽

2014 ◽

Cited By ~ 1

Author(s):

Lin Huang ◽

Bo Wang ◽

Ruitang Chen ◽

Sivan Bercovici ◽

Serafim Batzoglou

Keyword(s):

Linkage Disequilibrium ◽

Missing Data ◽

Large Scale ◽

Low Frequency ◽

Genomic Variation ◽

Whole Genome ◽

Single Individual ◽

Joint Inference ◽

Low Coverage ◽

Allele Discovery

Population low-coverage whole-genome sequencing is rapidly emerging as a prominent approach for discovering genomic variation and genotyping a cohort. This approach combines substantially lower cost than full-coverage sequencing with whole-genome discovery of low-allele-frequency variants, to an extent that is not possible with array genotyping or exome sequencing. However, a challenging computational problem arises when attempting to discover variants and genotype the entire cohort. Variant discovery and genotyping are relatively straightforward on a single individual that has been sequenced at high coverage, because the inference decomposes into the independent genotyping of each genomic position for which a sufficient number of confidently mapped reads are available. However, in cases where low-coverage population data are given, the joint inference requires leveraging the complex linkage disequilibrium patterns in the cohort to compensate for sparse and missing data in each individual. The potentially massive computation time for such inference, as well as the missing data that confound low-frequency allele discovery, need to be overcome for this approach to become practical. Here, we present Reveel, a novel method for single nucleotide variant calling and genotyping of large cohorts that have been sequenced at low coverage. Reveel introduces a novel technique for leveraging linkage disequilibrium that deviates from previous Markov-based models. We evaluate Reveel???s performance through extensive simulations as well as real data from the 1000 Genomes Project, and show that it achieves higher accuracy in low-frequency allele discovery and substantially lower computation cost than previous state-of-the-art methods.

Download Full-text

Analysis of Environment-Marker Associations in American Chestnut

Forests ◽

10.3390/f9110695 ◽

2018 ◽

Vol 9 (11) ◽

pp. 695 ◽

Cited By ~ 1

Author(s):

Markus Müller ◽

C. Nelson ◽

Oliver Gailing

Keyword(s):

Genetic Variation ◽

Environmental Conditions ◽

Cryphonectria Parasitica ◽

American Chestnut ◽

Distribution Area ◽

Chestnut Blight ◽

Outlier Analysis ◽

Adaptive Genetic Variation ◽

Outlier Loci ◽

Adaptive Processes

American chestnut (Castanea dentata Borkh.) was a dominant tree species in its native range in eastern North America until the accidentally introduced fungus Cryphonectria parasitica (Murr.) Barr, that causes chestnut blight, led to a collapse of the species. Different approaches (e.g., genetic engineering or conventional breeding) are being used to fight against chestnut blight and to reintroduce the species with resistant planting stock. Because of large climatic differences within the distribution area of American chestnut, successful reintroduction of the species requires knowledge and consideration of local adaptation to the prevailing environmental conditions. Previous studies revealed clear patterns of genetic diversity along the northeast-southwest axis of the Appalachian Mountains, but less is known about the distribution of potentially adaptive genetic variation within the distribution area of this species. In this study, we investigated neutral and potentially adaptive genetic variation in nine American chestnut populations collected from sites with different environmental conditions. In total, 272 individuals were genotyped with 24 microsatellite (i.e., simple sequence repeat (SSR)) markers (seven genomic SSRs and 17 EST-SSRs). An FST-outlier analysis revealed five outlier loci. The same loci, as well as five additional ones, were significantly associated with environmental variables of the population sites in an environmental association analysis. Four of these loci are of particular interest, since they were significant in both methods, and they were associated with environmental variation, but not with geographic variation. Hence, these loci might be involved in (temperature-related) adaptive processes in American chestnut. This work aims to help understanding the genetic basis of adaptation in C. dentata, and therefore the selection of suitable provenances for further breeding efforts.

Download Full-text

NOISYmputer: genotype imputation in bi-parental populations for noisy low-coverage next-generation sequencing data

10.1101/658237 ◽

2019 ◽

Author(s):

Mathias Lorieux ◽

Anestis Gkanogiannis ◽

Christopher Fragoso ◽

Jean-François Rami

Keyword(s):

Next Generation Sequencing ◽

Missing Data ◽

Noisy Data ◽

Genetic Maps ◽

Genotype Imputation ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Low Coverage ◽

Generation Sequencing

AbstractMotivationLow-coverage next-generation sequencing (LC-NGS) methods can be used to genotype bi-parental populations. This approach allows the creation of highly saturated genetic maps at reasonable cost, precisely localized recombination breakpoints, and minimize mapping intervals for quantitative-trait locus analysis.The main issues with these genotyping methods are (1) poor performance at heterozygous loci, (2) a high percentage of missing data, (3) local errors due to erroneous mapping of sequencing reads and reference genome mistakes, and (4) global, technical errors inherent to NGS itself.Recent methods like Tassel-FSFHap or LB-Impute are excellent at addressing issues 1 and 2, but nonetheless perform poorly when issues 3 and 4 are persistent in a dataset (i.e. “noisy” data). Here, we present an algorithm for imputation of LC-NGS data that eliminates the need of complex pre-filtering of noisy data, accurately types heterozygous chromosomic regions, corrects erroneous data, and imputes missing data. We compare its performance with Tassel-FSFHap, LB-Impute, and Genotype-Corrector using simulated data and three real datasets: a rice single seed descent (SSD) population genotyped by genotyping by sequencing (GBS) by whole genome sequencing (WGS), and a sorghum SSD population genotyped by GBS.AvailabilityNOISYmputer, a Microsoft Excel-Visual Basic for Applications program that implements the algorithm, is available at mapdisto.free.fr. It runs in Apple macOS and Microsoft Windows operating systems.Supplementary files: Download link

Download Full-text

Genomic landscape of SARS-CoV-2 pandemic in Brazil suggests an external P.1 variant origin

10.1101/2021.11.10.21266084 ◽

2021 ◽

Author(s):

Camila Pereira Perico ◽

Camilla Reginatto De Pierri ◽

Giuseppe Pasqualato Neto ◽

Danrley Rafael Fernandes ◽

Fabio de Oliveira Pedrosa ◽

...

Keyword(s):

Data Science ◽

External Source ◽

Genomic Diversity ◽

Evolutionary Analysis ◽

Proteomic Data ◽

Genomic Landscape ◽

The Stability ◽

Genetic Profiles ◽

Low Coverage ◽

Second Wave

Brazil was the epicenter of worldwide pandemics at the peak of its second wave. The genomic/proteomic perspective of the COVID-19 pandemic in Brazil can bring new light to understand the global pandemics behavior. In this study, we track SARS-CoV-2 molecular information in Brazil using real-time bioinformatics and data science strategies to provide a comparative and evolutive panorama of the lineages in the country. SWeeP vectors represented the Brazilian and worldwide genomic/proteomic data from GISAID between 02/2020-08/2021. Clusters were analyzed and compared with PANGO lineages. Hierarchical clustering provided phylogenetic and evolutionary analysis of the lineages, and we tracked the P.1 (Gamma) variant origin. The genomic diversity based on Chao's estimation allowed us to compare richness and coverage among Brazilian states and other representative countries. We found that epidemics in Brazil occurred in two distinct moments, with different genetic profiles. The P.1 lineages emerged in the second wave, which was more aggressive. We could not trace the origin of P.1 from the variants present in Brazil in 2020. Instead, we found evidence pointing to its external source and a possible recombinant event that may relate P.1 to the B.1.1.28 variant subset. We discussed the potential application of the pipeline for emerging variants detection and the stability of the PANGO terminology over time. The diversity analysis showed that the low coverage and unbalanced sequencing among states in Brazil could have allowed the silenty entry and dissemination of P.1 and other dangerous variants. This comparative and evolutionary analysis may help to understand the development and the consequences of the entry of variants of concern (VOC).

Download Full-text

Recording Patient Data in Burn Unit Logbooks in Rwanda: Who and What Are We Missing?

Journal of Burn Care & Research ◽

10.1093/jbcr/iraa198 ◽

2020 ◽

Author(s):

Elizabeth Miranda ◽

Lotta Velin ◽

Faustin Ntirenganya ◽

Robert Riviello ◽

Francoise Mukagaju ◽

...

Keyword(s):

Missing Data ◽

University Teaching ◽

Burn Patients ◽

Middle Income ◽

Data Completeness ◽

Middle Income Countries ◽

Burn Care ◽

Burn Unit ◽

Over Time ◽

Low And Middle Income

Abstract Systematic data collection in high-income countries has demonstrated a decreasing burn morbidity and mortality, whereas lack of data from low- and middle-income countries hinders a global overview of burn epidemiology. In low- and middle-income countries, dedicated burn registries are few. Instead, burn data are often recorded in logbooks or as one variable in trauma registries, where incomplete or inconsistently recorded information is a known challenge. The University Teaching Hospital of Kigali hosts the only dedicated burn unit in Rwanda and has collected data on patients admitted for acute burn care in logbooks since 2005. This study aimed to assess the data registered between January 2005 and December 2019, to evaluate the extent of missing data, and to identify possible factors associated with “missingness.” All data were analyzed using descriptive statistics, Fisher’s exact test, and Wilcoxon Rank Sum test. In this study, 1093 acute burn patients were included and 64.2% of them had incomplete data. Data completeness improved significantly over time. The most commonly missing variables were whether the patient was referred from another facility and information regarding whether any surgical intervention was performed. Missing data on burn mechanism, burn degree, and surgical treatment were associated with in-hospital mortality. In conclusion, missing data is frequent for acute burn patients in Rwanda, although improvements have been seen over time. As Rwanda and other low- and middle-income countries strive to improve burn care, ensuring data completeness will be essential for the ability to accurately assess the quality of care, and hence improve it.

Download Full-text

Open discussion

Symposium - International Astronomical Union ◽

10.1017/s0074180900029533 ◽

1982 ◽

Vol 99 ◽

pp. 605-613

Author(s):

P. S. Conti

Keyword(s):

Buenos Aires ◽

Large Mass ◽

List Type ◽

Common Phenomenon ◽

Open Discussion ◽

The Stability ◽

Ring Nebulae

Conti: One of the main conclusions of the Wolf-Rayet symposium in Buenos Aires was that Wolf-Rayet stars are evolutionary products of massive objects. Some questions:–Do hot helium-rich stars, that are not Wolf-Rayet stars, exist?–What about the stability of helium rich stars of large mass? We know a helium rich star of ∼40 MO. Has the stability something to do with the wind?–Ring nebulae and bubbles : this seems to be a much more common phenomenon than we thought of some years age.–What is the origin of the subtypes? This is important to find a possible matching of scenarios to subtypes.

Download Full-text

Symmetric Multistep Methods Revisited: II. Numerical Experiments

International Astronomical Union Colloquium ◽

10.1017/s0252921100031602 ◽

1999 ◽

Vol 173 ◽

pp. 309-314 ◽

Cited By ~ 3

Author(s):

T. Fukushima

Keyword(s):

Multistep Methods ◽

Computational Time ◽

Integration Time ◽

Implicit Methods ◽

Symmetric Methods ◽

Celestial Bodies ◽

The Stability ◽

Predictor Corrector ◽

Symmetric Multistep Methods ◽

Integration Errors

AbstractBy using the stability condition and general formulas developed by Fukushima (1998 = Paper I) we discovered that, just as in the case of the explicit symmetric multistep methods (Quinlan and Tremaine, 1990), when integrating orbital motions of celestial bodies, the implicit symmetric multistep methods used in the predictor-corrector manner lead to integration errors in position which grow linearly with the integration time if the stepsizes adopted are sufficiently small and if the number of corrections is sufficiently large, say two or three. We confirmed also that the symmetric methods (explicit or implicit) would produce the stepsize-dependent instabilities/resonances, which was discovered by A. Toomre in 1991 and confirmed by G.D. Quinlan for some high order explicit methods. Although the implicit methods require twice or more computational time for the same stepsize than the explicit symmetric ones do, they seem to be preferable since they reduce these undesirable features significantly.

Download Full-text

Uniformity of Magnification from Different Electron Microscopes

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100060179 ◽

1967 ◽

Vol 25 ◽

pp. 32-33

Author(s):

Godfrey C. Hoskins ◽

V. Williams ◽

V. Allison

Keyword(s):

Diffraction Grating ◽

The Other ◽

Carbon Replica ◽

Electron Microscopes ◽

The Stability

The method demonstrated is an adaptation of a proven procedure for accurately determining the magnification of light photomicrographs. Because of the stability of modern electrical lenses, the method is shown to be directly applicable for providing precise reproducibility of magnification in various models of electron microscopes.A readily recognizable area of a carbon replica of a crossed-line diffraction grating is used as a standard. The same area of the standard was photographed in Phillips EM 200, Hitachi HU-11B2, and RCA EMU 3F electron microscopes at taps representative of the range of magnification of each. Negatives from one microscope were selected as guides and printed at convenient magnifications; then negatives from each of the other microscopes were projected to register with these prints. By deferring measurement to the print rather than comparing negatives, correspondence of magnification of the specimen in the three microscopes could be brought to within 2%.

Download Full-text