sequencing process
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 22)

H-INDEX

5
(FIVE YEARS 3)

2021 ◽  
Vol 888 (1) ◽  
pp. 012002
Author(s):  
A P Z N L Sari ◽  
I R Athifa ◽  
Panjono ◽  
R Hidayat ◽  
Y G Noor ◽  
...  

Abstract Single nucleotide polymorphism (SNP) in the MC4R gene has been known to be associated with feed intake and growth performance. Our objective was to analyze the association of SNP g.880A>G to birth weight (BW), weaning weight (WW), and 6-month body weight (MW) in F1 cross Dorper x Garut sheep. In forty-one F1 cross Dorper x Garut sheep with phenotypic records, genotyping based on SNP g.880A>G was achieved using the direct-sequencing process. As a result, the homozygous AA genotype was absent in the samples. The frequency of the G allele (90%) was higher than the A allele (10%), followed by GG (80%) and AG (20%) genotypes. The population did not deviate from Hardy Weinberg Equilibrium (p > 0.05) based on SNP g.880A>G. The SNP g.880A>G was significantly associated with MW but not significant in BW and WW. The GG genotype (32.33 ± 4.81 kg) was higher MW than the AG genotype (27.19 ± 1.86 kg). In conclusion, the findings suggested that SNP g.880A>G of the MC4R gene could be used as a potential selection tool for high MW in F1 cross Dorper x Garut sheep.


2021 ◽  
Author(s):  
Niu Yi ◽  
Ma Mingming ◽  
Li Fu ◽  
Liu Xianming ◽  
Shi Guangming

Abstract Background: With the rapid development of high-throughput sequencing technology, the cost of whole genome sequencing drops rapidly, which leads to an exponential growth of genome data. Although the compression of DNA bases has achieved significant improvement in recent years, the compression of quality score is still challenging.Results: In this paper, by reinvestigating the inherent correlations between the quality score and the sequencing process, we propose a novel lossless quality score compressor based on adaptive coding order (ACO). The main objective of ACO is to traverse the quality score adaptively in the most correlative trajectory according to the sequencing process. By cooperating with the adaptive arithmetic coding and context modeling, ACO achieves the state-of-the-art quality score compression performances with moderate complexity.Conclusions: The competence enables ACO to serve as a candidate tool for quality score compression, ACO has been employed by AVS(Audio Video coding Standard Workgroup of China) and is freely available at https://github.com/Yoniming/code.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 22
Author(s):  
Anna Pavlova ◽  
Vera Belova ◽  
Robert Afasizhev ◽  
Irina Bulusheva ◽  
Denis Rebrikov ◽  
...  

During the sequencing process, problems can occur with any device, including the MGISEQ-2000 (DNBSEQ-G400) platform. We encountered a power outage that resulted in a temporary shutdown of a sequencer in the middle of the run. Since barcode reading in MGISEQ-2000 takes place at the end of the run, it was impossible to use non-demultiplexed raw data. We decided to completely use up the same cartridge with reagents and flow cell loaded with DNB and started a new run in a shortened custom mode. We figured out how the MGISEQ-2000 converts preliminary data in .cal format into .fastq files and wrote a script named “Runcer-Necromacer” for merging .fastq files based on the analysis of their headers (available online: https://github.com/genomecenter/runcer-necromancer). Read merging proved to be possible because the MGISEQ-2000 flow cell has a patterned structure and each DNB has invariable coordinates on it, regardless of its position on the flow cell stage. We demonstrated the correctness of data merging by comparing sample analysis results with previously obtained .fastq files for them. Thus, we confirmed that it is possible to restart the device and save both parts of the interrupted run.


2020 ◽  
Vol 11 (4) ◽  
pp. 7436-7441
Author(s):  
Naman Jain ◽  
Lily Priyadarshini ◽  
Kritika Sharma ◽  
Iyappan S ◽  
Jaganathan M K ◽  
...  

Polyhydroxybutyrates (PHBs) are biodegradable polymers synthesised and stored as cytoplasmic inclusions in various bacteria. They have a wide variety of applications in various fields such as Biomedical, food, agriculture, and pharmaceutical industries and used as a vehicle for controlled-release drug delivery system. The PHB producing microorganisms were isolated from the dump soil, screened by fluorescence microscope at 490 nm and characterised by 16sRNA sequencing. Process parameters optimisation is performed for maximum PHB production by changing the parameters, viz., temperature, pH, different carbon, and nitrogen source. The isolate showed maximum PHB accumulation in the concentration of 0.07 mg/mL after 72 hours incubation at 35⁰C and in pH 7 showed the maximum concentration of 0.055 mg/mL. FT-IR characterised PHB shows the bands at 3426 cm-1 are due to the presence of C–H methylene and methyl groups and retention time of the peak at 12.39 min was determined HPLC. D- Glucose was found to be the best carbon source for the maximum production of PHB in the concentration of 0.0319 mg/mL and the media supplemented with peptone as the nitrogen source showed the 0.0723 mg/mL is the maximum accumulation of PHB in the cells; thus, the isolate shows the potential of PHB production for further exploitation.


Author(s):  
Shubham Chandak ◽  
Kedar Tatwawadi ◽  
Srivatsan Sridhar ◽  
Tsachy Weissman

Abstract Motivation Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications. Results We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (≲0.2% reduction) and consensus accuracy (≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications. Availabilityand implementation The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Dhaivat Joshi ◽  
Shunfu Mao ◽  
Sreeram Kannan ◽  
Suhas Diggavi

Abstract Motivation Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. Results We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. Availability and implementation https://github.com/joshidhaivat/QAlign.git. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Anna Pavlova ◽  
Vera Belova ◽  
Robert Afasizhev ◽  
Irina Bulusheva ◽  
Denis Rebrikov ◽  
...  

AbstractDuring the sequencing process, problems can occur with any device including the MGISEQ-2000 (DNBSEQ-G400) platform. We encountered a power outage that resulted in a temporary shutdown of a sequencer in the middle of the run. Since barcode reading in MGISEQ-2000 takes place at the end of the run, it was impossible to use non-demultiplexed raw data. We decided to completely use up the same cartridge with reagents and flow cell loaded with DNB and started a new run in a shortened custom mode. We figured out how the MGISEQ-2000 converts preliminary data in .cal format into .fastq files and wrote a script named “Runcer-Necromacer” for merging .fastq files based on the analysis of their headers (available online: https://github.com/genomecenter/runcer-necromancer). Read merging proved to be possible because the MGISEQ-2000 flow cell has a patterned structure and each DNB has invariable coordinates on it, regardless of its position on the flow cell stage. We demonstrated the correctness of data merging by comparing sample analysis results with previously obtained .fastq files for them. Thus, we confirmed that it is possible to restart the device and save both parts of the interrupted run.


2020 ◽  
Author(s):  
Shubham Chandak ◽  
Kedar Tatwawadi ◽  
Srivatsan Sridhar ◽  
Tsachy Weissman

AbstractMotivationNanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications.ResultsWe explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35-50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (≲0.2% reduction) and consensus accuracy (≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications.AvailabilityThe code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation.Supplementary informationSupplementary data are available at Bioinformatics [email protected]


2020 ◽  
Vol 840 ◽  
pp. 162-170
Author(s):  
Ganies Riza Aristya ◽  
Fauzana Putri ◽  
Rina Sri Kasiamdari ◽  
Arni Musthofa

Sugarcane (Saccharum officinarum L.) is an agricultural commodities with a great extent of diversity and high economic value. In Indonesia, the great extent of diversity of sugarcane is evidenced by a large number of cultivars cultivated. Sugarcane diversities at the molecular level can be seen using DNA barcodes, one of which is the matK. The purpose of the study was to identify and characterize matK and reconstruct the phylogenetic tree to determine the phylogeny of 24 sugarcane cultivars Indonesia. matK was amplified using the PCR method with matK F-5’ATGATTAATTAAGAGTAAGAGGAT-3’ and matK R-5’AATGCAAAAATTCGAAGGGT-3. Results showed that the matK gene was successfully amplified as many as 1531 bp. The sequencing process was done to determine the nucleotide sequence and compared with those of the GenBank database. It showed that the samples used had a similarity of 98.87%-99.44% to that of matK in Saccharum officinarum, Saccharum hybrid cultivar and Saccharum spontaneum. Reconstruction of the phylogenetic tree showed that the samples used were located in the same clade with a zero genetic distance, while all the references from NCBI were also located in the same clade. The analysis of genetic variation indicated that it had no haplotype value.


2020 ◽  
Author(s):  
Joshua Harrison ◽  
W. John Calder ◽  
Bryan N. Shuman ◽  
C. Alex Buerkle

To characterize microbiomes and other ecological assemblages, ecologists routinely sequence and compare loci that differ among focal taxa. Counts of these sequences convey information regarding the occurrence and relative abundances of taxa, but provide no direct measure of their absolute abundances, due to the technical limitations of the sequencing process. The relative abundances in compositional data are inherently constrained and difficult to interpret. The incorporation of internal standards (ISDs; colloquially referred to as ``spike-ins'') into DNA pools can ameliorate the problems posed by relative abundance data and allow absolute abundances to be approximated. Unfortunately, many laboratory and sampling biases cause ISDs to underperform or fail. Here, we discuss how careful deployment of ISDs can avoid these complications and be an integral component of well-designed studies seeking to characterize ecological assemblages via sequencing of DNA.


Sign in / Sign up

Export Citation Format

Share Document