scholarly journals Software choice and depth of sequence coverage can impact plastid genome assembly - A case study in the narrow endemic Calligonum bakuense

2021 ◽  
Author(s):  
Eka Giorgashvili ◽  
Katja Reichel ◽  
Calvinna Caswara ◽  
Vuqar Kerimov ◽  
Thomas Borsch ◽  
...  

Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequence coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense, which forms a distinct lineage in the genus Calligonum. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequence coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and three levels of sequence coverage (original depth, 2,000x, and 500x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic tree inference is also assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produced the most consistent assemblies for C. bakuense. Moreover, we found that a cap in sequence coverage can reduce both the sequence variability across assembly contigs and computation time. While no evidence was found that the sequence variability across assemblies was large enough to affect the phylogenetic position inferred for C. bakuense, differences among the assemblies may influence genotype recognition at the population level.

2019 ◽  
Author(s):  
Michael Gruenstaeudl ◽  
Nils Jenke

ABSTRACTBackgroundThe circular, quadripartite structure of plastid genomes which includes two inverted repeat regions renders the automatic assembly of plastid genomes challenging. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on plastid genome structure and evolution. Plastome-based phylogenetic or population genetic investigations, for example, require the precise identification of DNA sequence and length to determine the location of nucleotide polymorphisms. The average coverage depth of a genome assembly is often used as an indicator for assembly quality. Visualizing coverage depth across a draft genome allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Based on such visualizations, users can conduct a local re-assembly or other forms of targeted error correction. Few, if any, contemporary software tools can visualize the coverage depth of a plastid genome assembly while taking its quadripartite structure into account, despite the interplay between genome structure and assembly quality. A software tool is needed that visualizes the coverage depth of a plastid genome assembly on a circular, quadripartite map of the plastid genome.ResultsWe introduce ‘PACVr’, an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as to the individual plastome genes. The tool allows visualizations on different scales using a variable window approach and also visualizes the equality of gene synteny in the inverted repeat regions of the plastid genome, thus providing an additional measure of assembly quality. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be directly invoked from a Unix shell, thus facilitating its use in automated quality control. We illustrate the application of PACVr on two empirical datasets and compare the resulting visualizations with alternative software tools for displaying plastome sequencing coverage.ConclusionsPACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) the equality of gene synteny in the inverted repeat regions. It, thus, contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences, especially in light of incongruence among the visualization results of alternative software tools. The software, example datasets, technical documentation, and a tutorial are available with the package at https://github.com/michaelgruenstaeudl/PACVr.


Methodology ◽  
2007 ◽  
Vol 3 (1) ◽  
pp. 14-23 ◽  
Author(s):  
Juan Ramon Barrada ◽  
Julio Olea ◽  
Vicente Ponsoda

Abstract. The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method ( Revuelta & Ponsoda, 1998 ), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.


2021 ◽  
Vol 11 (2) ◽  
pp. 131
Author(s):  
Laura B. Scheinfeldt ◽  
Andrew Brangan ◽  
Dara M. Kusic ◽  
Sudhir Kumar ◽  
Neda Gharani

Pharmacogenomics holds the promise of personalized drug efficacy optimization and drug toxicity minimization. Much of the research conducted to date, however, suffers from an ascertainment bias towards European participants. Here, we leverage publicly available, whole genome sequencing data collected from global populations, evolutionary characteristics, and annotated protein features to construct a new in silico machine learning pharmacogenetic identification method called XGB-PGX. When applied to pharmacogenetic data, XGB-PGX outperformed all existing prediction methods and identified over 2000 new pharmacogenetic variants. While there are modest pharmacogenetic allele frequency distribution differences across global population samples, the most striking distinction is between the relatively rare putatively neutral pharmacogene variants and the relatively common established and newly predicted functional pharamacogenetic variants. Our findings therefore support a focus on individual patient pharmacogenetic testing rather than on clinical presumptions about patient race, ethnicity, or ancestral geographic residence. We further encourage more attention be given to the impact of common variation on drug response and propose a new ‘common treatment, common variant’ perspective for pharmacogenetic prediction that is distinct from the types of variation that underlie complex and Mendelian disease. XGB-PGX has identified many new pharmacovariants that are present across all global communities; however, communities that have been underrepresented in genomic research are likely to benefit the most from XGB-PGX’s in silico predictions.


Author(s):  
Martin Stervander ◽  
William A Cresko

Abstract The fish order Syngnathiformes has been referred to as a collection of misfit fishes, comprising commercially important fish such as red mullets as well as the highly diverse seahorses, pipefishes, and seadragons—the well-known family Syngnathidae, with their unique adaptations including male pregnancy. Another ornate member of this order is the species mandarinfish. No less than two types of chromatophores have been discovered in the spectacularly colored mandarinfish: the cyanophore (producing blue color) and the dichromatic cyano-erythrophore (producing blue and red). The phylogenetic position of mandarinfish in Syngnathiformes, and their promise of additional genetic discoveries beyond the chromatophores, made mandarinfish an appealing target for whole genome sequencing. We used linked sequences to create synthetic long reads, producing a highly contiguous genome assembly for the mandarinfish. The genome assembly comprises 483 Mbp (longest scaffold 29 Mbp), has an N50 of 12 Mbp, and an L50 of 14 scaffolds. The assembly completeness is also high, with 92.6% complete, 4.4% fragmented, and 2.9% missing out of 4,584 BUSCO genes found in ray-finned fishes. Outside the family Syngnathidae, the mandarinfish represents one of the most contiguous syngnathiform genome assemblies to date. The mandarinfish genomic resource will likely serve as a high-quality outgroup to syngnathid fish, and furthermore for research on the genomic underpinnings of the evolution of novel pigmentation.


Author(s):  
James Dallas ◽  
Yifan Weng ◽  
Tulga Ersal

Abstract In this work, a novel combined trajectory planner and tracking controller is developed for autonomous vehicles operating on off-road deformable terrains. Common approaches to trajectory planning and tracking often rely on model-dependent schemes, which utilize a simplified model to predict the impact of control inputs to future vehicle response. However, in an off-road context and especially on deformable terrains, accurately modeling the vehicle response for predictive purposes can be challenging due to the complexity of the tire-terrain interaction and limitations of state-of-the-art terramechanics models in terms of operating conditions, computation time, and continuous differentiability. To address this challenge and improve vehicle safety and performance through more accurate prediction of the plant response, in this paper, a nonlinear model predictive control framework is presented that accounts for terrain deformability explicitly using a neural network terramechanics model for deformable terrains. The utility of the proposed scheme is demonstrated on high fidelity simulations for a notional lightweight military vehicle on soft soil. It is shown that the neural network based controller can outperform a baseline Pacejka model based scheme by improving on performance metrics associated with the cost function. In more severe maneuvers, the neural network based controller can achieve sufficient fidelity as compared to the plant to complete maneuvers that lead to failure for the Pacejka based controller. Finally, it is demonstrated that the proposed framework is conducive to real-time implementability.


Energies ◽  
2018 ◽  
Vol 11 (9) ◽  
pp. 2268 ◽  
Author(s):  
Dong-Hee Yoon ◽  
Sang-Kyun Kang ◽  
Minseong Kim ◽  
Youngsun Han

We present a novel architecture of parallel contingency analysis that accelerates massive power flow computation using cloud computing. It leverages cloud computing to investigate huge power systems of various and potential contingencies. Contingency analysis is undertaken to assess the impact of failure of power system components; thus, extensive contingency analysis is required to ensure that power systems operate safely and reliably. Since many calculations are required to analyze possible contingencies under various conditions, the computation time of contingency analysis increases tremendously if either the power system is large or cascading outage analysis is needed. We also introduce a task management optimization to minimize load imbalances between computing resources while reducing communication and synchronization overheads. Our experiment shows that the proposed architecture exhibits a performance improvement of up to 35.32× on 256 cores in the contingency analysis of a real power system, i.e., KEPCO2015 (the Korean power system), by using a cloud computing system. According to our analysis of the task execution behaviors, we confirmed that the performance can be enhanced further by employing additional computing resources.


2019 ◽  
Vol 69 (1) ◽  
pp. 39-54 ◽  
Author(s):  
Mohammad Nazari-Sharabian ◽  
Masoud Taheriyoun ◽  
Moses Karakouzian

Abstract This study investigates the impact of different digital elevation model (DEM) resolutions on the topological attributes and simulated runoff, as well as the sensitivity of runoff parameters in the Mahabad Dam watershed in Iran. The watershed and streamlines were delineated in ArcGIS, and the hydrologic analyses were performed using the Soil and Water Assessment Tool (SWAT). The sensitivity analysis on runoff parameters was performed, using the Sequential Uncertainties FItting Ver. 2 algorithm, in the SWAT Calibration and Uncertainty Procedures (SWAT-CUP) program. The results indicated that the sensitivity of runoff parameters, watershed surface area, and elevations changed under different DEM resolutions. As the distribution of slopes changed using different DEMs, surface parameters were most affected. Furthermore, higher amounts of runoff were generated when DEMs with finer resolutions were implemented. In comparison with the observed value of 8 m3/s at the watershed outlet, the 12.5 m DEM showed more realistic results (6.77 m3/s). Comparatively, the 12.5 m DEM generated 0.74% and 2.73% more runoff compared with the 30 and 90 m DEMs, respectively. The findings of this study indicate that in order to reduce computation time, researchers may use DEMs with coarser resolutions at the expense of minor decreases in accuracy.


2014 ◽  
Vol 15 (1) ◽  
pp. 59-68
Author(s):  
Lazarus Okechukwu Uzoechi ◽  
Satish M. Mahajan

Abstract This paper presents a methodology to evaluate transient stability constrained available transfer capability (ATC). A linear and fast line flow–based (LFB) method was adopted to optimize the ATC values. This enabled the direct determination of the system source–sink locations. This paper formulated different market transactions considering bilateral and multilateral impacts in the stability constrained ATC. The proposed method was demonstrated on the WECC 9-bus and IEEE 39-bus systems. The critical energy performance index (CEPI) enabled the direct identification of candidates for contingency screening based on ranking. This index helped to reduce the list of credible contingencies for ATC evaluation and, therefore, the computation time. The results of the proposed ATC method are consistent with the literature and can be deployed for fast assessment of the impact of transactions in an electric power system.


Sign in / Sign up

Export Citation Format

Share Document