scholarly journals ViralFlow: an automated workflow for SARS-CoV-2 genome assembly, lineage assignment, mutations and intrahost variants detection

Author(s):  
Filipe Zimmer Dezordi ◽  
Tulio de Lima Campos ◽  
Pedro Miguel Carneiro Jeronimo ◽  
Cleber Furtado Aksenen ◽  
Suzana Porto Almeida ◽  
...  

The COVID-19 pandemic, a disease caused by the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology to monitor and study this emerging virus. The current deluge of SARS-CoV-2 genomic data being generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows for data analysis. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all the steps involved in SARS-CoV-2 genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The workflow presented here (https://github.com/dezordi/ViralFlow) is available through Docker or Singularity images, allowing implementation in laptops for small scale analyses or in high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores makes it a versatile tool for SARS-CoV-2 genomic analysis.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanan Ren ◽  
Ting-You Wang ◽  
Leah C. Anderton ◽  
Qi Cao ◽  
Rendong Yang

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yu Chen ◽  
Yixin Zhang ◽  
Amy Y. Wang ◽  
Min Gao ◽  
Zechen Chong

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.


Author(s):  
Anjan Pakhira ◽  
Peter Andras

Testing is a critical phase in the software life-cycle. While small-scale component-wise testing is done routinely as part of development and maintenance of large-scale software, the system level testing of the whole software is much more problematic due to low level of coverage of potential usage scenarios by test cases and high costs associated with wide-scale testing of large software. Here, the authors investigate the use of cloud computing to facilitate the testing of large-scale software. They discuss the aspects of cloud-based testing and provide an example application of this. They describe the testing of the functional importance of methods of classes in the Google Chrome software. The methods that we test are predicted to be functionally important with respect to a functionality of the software. The authors use network analysis applied to dynamic analysis data generated by the software to make these predictions. They check the validity of these predictions by mutation testing of a large number of mutated variants of the Google Chrome. The chapter provides details of how to set up the testing process on the cloud and discusses relevant technical issues.


2015 ◽  
pp. 1175-1203
Author(s):  
Anjan Pakhira ◽  
Peter Andras

Testing is a critical phase in the software life-cycle. While small-scale component-wise testing is done routinely as part of development and maintenance of large-scale software, the system level testing of the whole software is much more problematic due to low level of coverage of potential usage scenarios by test cases and high costs associated with wide-scale testing of large software. Here, the authors investigate the use of cloud computing to facilitate the testing of large-scale software. They discuss the aspects of cloud-based testing and provide an example application of this. They describe the testing of the functional importance of methods of classes in the Google Chrome software. The methods that we test are predicted to be functionally important with respect to a functionality of the software. The authors use network analysis applied to dynamic analysis data generated by the software to make these predictions. They check the validity of these predictions by mutation testing of a large number of mutated variants of the Google Chrome. The chapter provides details of how to set up the testing process on the cloud and discusses relevant technical issues.


Author(s):  
Liam F Spurr ◽  
Mehdi Touat ◽  
Alison M Taylor ◽  
Adrian M Dubuc ◽  
Juliann Shih ◽  
...  

Abstract Summary The expansion of targeted panel sequencing efforts has created opportunities for large-scale genomic analysis, but tools for copy-number quantification on panel data are lacking. We introduce ASCETS, a method for the efficient quantitation of arm and chromosome-level copy-number changes from targeted sequencing data. Availability and implementation ASCETS is implemented in R and is freely available to non-commercial users on GitHub: https://github.com/beroukhim-lab/ascets, along with detailed documentation. Supplementary information Supplementary data are available at Bioinformatics online.


1995 ◽  
Vol 35 (1) ◽  
pp. 436 ◽  
Author(s):  
G.T. Cooper

The Eastern Otway Basin exhibits two near-or-thogonal structural grains, specifically NE-SW and WNW-ESE trending structures dominating the Otway Ranges, Colac Trough and Torquay Embayment. The relative timing of these structures is poorly constrained, but dip analysis data from offshore seismic lines in the Torquay Embayment show that two distinct structural provinces developed during two separate extensional episodes.The Snail Terrace comprises the southern structural province of the Torquay Embayment and is characterised by the WNW-ESE trending basin margin fault and a number of small scale NW-SE trending faults. The Torquay Basin Deep makes up the northern structural province and is characterised by the large scale, cuspate Snail Fault which trends ENE-WSW with a number of smaller NE-SW trending faults present.Dip analysis of basement trends shows a bimodal population in the Torquay Embayment. The Snail Terrace data show extension towards the SSW (193°), but this trend changes abruptly to the NE across a hinge zone. Dip data in the Torquay Basin Deep and regions north of the hinge zone show extension towards the SSE (150°). Overall the data show the dominance of SSE extension with a mean vector of 166°.Seismic data show significant growth of the Crayfish Group on the Snail Terrace and a lesser growth rate in the Torquay Basin Deep. Dip data from the Snail Terrace are therefore inferred to represent the direction of basement rotation during the first phase of continental extension oriented towards the SSW during the Berriasian-Barremian? (146-125 Ma). During this phase the basin margin fault formed as well as NE-SW trending ?transtensional structures in the Otway Ranges and Colac Trough, probably related to Palaeozoic features.Substantial growth along the Snail Fault during the Aptian-Albian? suggests that a second phase of extension affected the area. The Colac Trough, Otway Ranges, Torquay Embayment and Strzelecki Ranges were significantly influenced by this Bassian phase of SSE extension which probably persisted during the Aptian-Albian? (125-97 Ma). This phase of extension had little effect in the western Otway Basin, west of the Sorrel Fault Zone, and was largely concentrated in areas within the northern failed Bass Strait Rift. During the mid-Cretaceous parts of the southern margin were subjected to uplift and erosion. Apatite fission track and vitrinite reflectance analyses show elevated palaeotemperatures associated with uplift east of the Sorell Fault Zone.


2019 ◽  
Vol 76 (6) ◽  
pp. 1601-1609 ◽  
Author(s):  
Tania Mendo ◽  
Sophie Smout ◽  
Tommaso Russo ◽  
Lorenzo D’Andrea ◽  
Mark James

Abstract Analysis of data from vessel monitoring systems and automated identification systems in large-scale fisheries is used to describe the spatial distribution of effort, impact on habitats, and location of fishing grounds. To identify when and where fishing activities occur, analysis needs to take account of different fishing practices in different fleets. Small-scale fisheries (SSFs) vessels have generally been exempted from positional reporting requirements, but recent developments of compact low-cost systems offer the potential to monitor them effectively. To characterize the spatial distribution of fishing activities in SSFs, positions should be collected with sufficient frequency to allow detection of different fishing behaviours, while minimizing demands for data transmission, storage, and analysis. This study sought to suggest optimal rates of data collection to characterize fishing activities at appropriate spatial resolution. In a SSF case study, on-board observers collected Global Navigation Satellite System (GNSS) position and fishing activity every second during each trip. In analysis, data were re-sampled to lower temporal resolutions to evaluate the effect on the identification of number of hauls and area fished. The effect of estimation at different spatial resolutions was also explored. Consistent results were found for polling intervals <60 s in small vessels and <120 in medium and large vessels. Grid cell size of 100 × 100 m resulted in best estimations of area fished. Remote collection and analysis of GNSS or equivalent data at low cost and sufficient resolution to infer small-scale fisheries activities. This has significant implications globally for sustainable management of these fisheries, many of which are currently unregulated.


2017 ◽  
Vol 742 ◽  
pp. 17-24
Author(s):  
Steve Sockol ◽  
Christoph Doerffel ◽  
Juliane Mehnert ◽  
Gerd Zwinzscher ◽  
Steffen Rein ◽  
...  

Fiber-reinforced thermoplastics have a high potential for big scale light weight process applications due to low processing times and recyclability. Further advantages are the low emissions during the manufacturing process and beneficial handling and storing properties of the semi finished materials. Thermoplastic composites are made of reinforcement fibers and a thermoplastic polymer matrix by applying two essential sub processes: (1) melting of the matrix-material and (2) impregnating the textile component with molten matrix-material. At present state of art both sub-processes are applied by using double-belt-presses, characterized by high processing temperatures and high processing forces. Therefore, a large amount of energy is needed to create the necessarily high compaction forces and temperatures with hydraulic cylinders and electric heating. Convection, infrared-radiation and the cooling (dynamic) of tempered machine parts leads to a significant dissipation of energy. Especially the process for generating the hydraulic pressure has a low level of efficiency. Therefore, in respect to economic and ecologic reasons, novel energy-efficient impregnation processes need to be investigated and developed. The represented novel impregnation process is based on ultrasonic technology. A stack of polymer film (outer layers) and a textile ply (inner layer) is formed and the energy is applied with an ultrasonic sonotrode. The efficient, fast and strongly concentrated energy application into the thermoplastic films allows the development of novel and highly flexible machine concepts. These can be used for development of small scale up to large scale production processes. The ultrasonic-technology allows a continuous impregnation of the textile component with molten matrix-material. A custom-designed prototype was developed. First material samples were produced and the technological parameters studied. A characterization of the experimental results, material samples, prototype machine and process is the focus of this paper.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Lidia de los Ríos-Pérez ◽  
Julien A. Nguinkal ◽  
Marieke Verleih ◽  
Alexander Rebl ◽  
Ronald M. Brunner ◽  
...  

AbstractPikeperch (Sander lucioperca) is a fish species with growing economic significance in the aquaculture industry. However, successful positioning of pikeperch in large-scale aquaculture requires advances in our understanding of its genome organization. In this study, an ultra-high density linkage map for pikeperch comprising 24 linkage groups and 1,023,625 single nucleotide polymorphisms markers was constructed after genotyping whole-genome sequencing data from 11 broodstock and 363 progeny, belonging to 6 full-sib families. The sex-specific linkage maps spanned a total of 2985.16 cM in females and 2540.47 cM in males with an average inter-marker distance of 0.0030 and 0.0026 cM, respectively. The sex-averaged map spanned a total of 2725.53 cM with an average inter-marker distance of 0.0028 cM. Furthermore, the sex-averaged map was used for improving the contiguity and accuracy of the current pikeperch genome assembly. Based on 723,360 markers, 706 contigs were anchored and oriented into 24 pseudomolecules, covering a total of 896.48 Mb and accounting for 99.47% of the assembled genome size. The overall contiguity of the assembly improved with a scaffold N50 length of 41.06 Mb. Finally, an updated annotation of protein-coding genes and repetitive elements of the enhanced genome assembly is provided at NCBI.


1996 ◽  
Vol 14 (7) ◽  
pp. 753-766
Author(s):  
G. Cautenet ◽  
D. Gbe

Abstract. The development of cirrus clouds is governed by large-scale synoptic movements such as updraft regions in convergence zones, but also by smaller scale features, for instance microphysical phenomena, entrainment, small-scale turbulence and radiative field, fall-out of the ice phase or wind shear. For this reason, the proper handling of cirrus life cycles is not an easy task using a large-scale model alone. We present some results from a small-scale cirrus cloud model initialized by ECMWF first-guess data, which prove more convenient for this task than the analyzed ones. This model is Starr\\'s 2-D cirrus cloud model, where the rate of ice production/destruction is parametrized from environmental data. Comparison with satellite and local observations during the ICE89 experiment (North Sea) shows that such an efficient model using large-scale data as input provides a reasonable diagnosis of cirrus occurrence in a given meteorological field. The main driving features are the updraft provided by the large-scale model, which enhances or inhibits the cloud development according to its sign, and the water vapour availability. The cloud fields retrieved are compared to satellite imagery. Finally, the use of a small-scale model in large-scale numerical studies is examined.


Sign in / Sign up

Export Citation Format

Share Document