scholarly journals Pembentukan Pustaka Genom, Resekuensing, dan Identifikasi SNP Berdasarkan Sekuen Genom Total Genotipe Kedelai Indonesia

2016 ◽  
Vol 11 (1) ◽  
pp. 7 ◽  
Author(s):  
I Made Tasma ◽  
Dani Satyawan ◽  
Habib Rijzaani

<p>Resequencing of the soybean genome facilitates SNP marker discoveries useful for supporting the national soybean breeding<br />programs. The objectives of the present study were to construct soybean genomic libraries, to resequence the whole genome of<br />five Indonesian soybean genotypes, and to identify SNPs based on the resequence data. The studies consisted of genomic<br />library construction and quality analysis, resequencing the whole-genome of five soybean genotypes, and genome-wide SNP<br />identification based on alignment of the resequence data with reference sequence, Williams 82. The five Indonesian soybean<br />genotypes were Tambora, Grobogan, B3293, Malabar, and Davros. The results showed that soybean genomic library was<br />successfully constructed having the size of 400 bp with library concentrations range from 21.2–64.5 ng/μl. Resequencing of the<br />libraries resulted in 50.1 x 109 bp total genomic sequence. The quality of genomic library and sequence data resulted from this<br />study was high as indicated by Q score of 88.6% with low sequencing error of only 0.97%. Bioinformatic analysis resulted in a<br />total of 2,597,286 SNPs, 257,598 insertions, and 202,157 deletions. Of the total SNPs identified, only 95,207 SNPs (2.15%) were<br />located within exons. Among those, 49,926 SNPs caused missense mutation and 1,535 SNPs caused nonsense mutation. SNPs<br />resulted from this study upon verification will be very useful for genome-wide SNP chip development of the soybean genome to<br />accelerate breeding program of the soybean.</p>

2016 ◽  
Author(s):  
Douglas W. Bjelland ◽  
Uday Lingala ◽  
Piyush Patel ◽  
Matt Jones ◽  
Matthew C. Keller

Identical by descent (IBD) segments are used to understand a number of fundamental issues in genetics. IBD segments are typically detected using long stretches of identical alleles between haplotypes in whole-genome SNP data. Phase or SNP call errors in genomic data can degrade accuracy of IBD detection and lead to false positive calls, false negative calls, and under- or overextension of true IBD segments. Furthermore, the number of comparisons increases quadratically with sample size, requiring high computational efficiency. We developed a new IBD segment detection program, FISHR (Find IBD Shared Haplotypes Rapidly), in an attempt to accurately detect IBD segments and to better estimate their endpoints using an algorithm that is fast enough to be deployed on the very large whole-genome SNP datasets. We compared the performance of FISHR to three leading IBD segment detection programs: GERMLINE, refinedIBD, and HaploScore. Using simulated and real genomic sequence data, we show that FISHR is slightly more accurate than all programs at detecting long (greater than 3 cM) IBD segments but slightly less accurate than refinedIBD at detecting short (1 cM) IBD segments. Moreover, FISHR outperforms all programs in determining the true endpoints of IBD segments, which is important for several reasons. FISHR takes two to four times longer than GERMLINE to run, whereas both GERMLINE and FISHR were orders of magnitude faster than refinedIBD and HaploScore. Overall, FISHR provides accurate IBD detection in unrelated individuals and is computationally efficient enough to be utilized on large SNP datasets greater than 20,000 individuals.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Pierpaolo Maisano Delser ◽  
Eppie R. Jones ◽  
Anahit Hovhannisyan ◽  
Lara Cassidy ◽  
Ron Pinhasi ◽  
...  

AbstractOver the last few years, genome-wide data for a large number of ancient human samples have been collected. Whilst datasets of captured SNPs have been collated, high coverage shotgun genomes (which are relatively few but allow certain types of analyses not possible with ascertained captured SNPs) have to be reprocessed by individual groups from raw reads. This task is computationally intensive. Here, we release a dataset including 35 whole-genome sequenced samples, previously published and distributed worldwide, together with the genetic pipeline used to process them. The dataset contains 72,041,355 sites called across 19 ancient and 16 modern individuals and includes sequence data from four previously published ancient samples which we sequenced to higher coverage (10–18x). Such a resource will allow researchers to analyse their new samples with the same genetic pipeline and directly compare them to the reference dataset without re-processing published samples. Moreover, this dataset can be easily expanded to increase the sample distribution both across time and space.


2020 ◽  
Author(s):  
Zalak Shah ◽  
Myo T Naung ◽  
Kara A Moser ◽  
Matthew Adams ◽  
Andrea G Buchwald ◽  
...  

Individuals acquire immunity to clinical malaria after repeated Plasmodium falciparum infections. This immunity to disease is thought to reflect the acquisition of a repertoire of responses to multiple alleles in diverse parasite antigens. In previous studies, we identified polymorphic sites within individual antigens that are associated with parasite immune evasion by examining antigen allele dynamics in individuals followed longitudinally. Here we expand this approach by analyzing genome-wide polymorphisms using whole genome sequence data from 140 parasite isolates representing malaria cases from a longitudinal study in Malawi and identify 25 genes that encode likely targets of naturally acquired immunity and that should be further characterized for their potential as vaccine candidates.


2020 ◽  
Author(s):  
Pierpaolo Maisano Delser ◽  
Eppie R. Jones ◽  
Anahit Hovhannisyan ◽  
Lara Cassidy ◽  
Ron Pinhasi ◽  
...  

AbstractOver the last few years, genome-wide data for a large number of ancient human samples have been collected. Whilst datasets of capture SNPs have been collated, high coverage shotgun genomes (which are relatively few but allow certain type of analyses not possible with ascertained captured SNPs) have to be reprocessed by individual groups from raw reads. This task is computationally intensive. Here, we release a dataset including 34 whole-genome sequenced samples, previously published and distributed worldwide, together with the genetic pipeline used to process them. The dataset contains 73,435,604 sites called across 18 ancient and 16 modern individuals and includes sequence data from four previously published ancient samples which we sequenced to higher coverage (10-18x). Such a resource will allow researchers to analyse their new samples with the same genetic pipeline and directly compare them to the reference dataset without re-processing published samples. Moreover, this dataset can be easily expanded to increase the sample distribution both across time and space.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Alejandra Vergara-Lope ◽  
M. Reza Jabalameli ◽  
Clare Horscroft ◽  
Sarah Ennis ◽  
Andrew Collins ◽  
...  

Abstract Quantification of linkage disequilibrium (LD) patterns in the human genome is essential for genome-wide association studies, selection signature mapping and studies of recombination. Whole genome sequence (WGS) data provides optimal source data for this quantification as it is free from biases introduced by the design of array genotyping platforms. The Malécot-Morton model of LD allows the creation of a cumulative map for each choromosome, analogous to an LD form of a linkage map. Here we report LD maps generated from WGS data for a large population of European ancestry, as well as populations of Baganda, Ethiopian and Zulu ancestry. We achieve high average genetic marker densities of 2.3–4.6/kb. These maps show good agreement with prior, low resolution maps and are consistent between populations. Files are provided in BED format to allow researchers to readily utilise this resource.


2017 ◽  
Vol 49 (3) ◽  
pp. 141-150 ◽  
Author(s):  
A. M. Carroll ◽  
R. Cheng ◽  
E. S. R. Collie-Duguid ◽  
C. Meharg ◽  
M. E. Scholz ◽  
...  

Muscle fiber cross-sectional area (CSA) and proportion of different fiber types are important determinants of muscle function and overall metabolism. Genetic variation plays a substantial role in phenotypic variation of these traits; however, the underlying genes remain poorly understood. This study aimed to map quantitative trait loci (QTL) affecting differences in soleus muscle fiber traits between the LG/J and SM/J mouse strains. Fiber number, CSA, and proportion of oxidative type I fibers were assessed in the soleus of 334 genotyped female and male mice of the F34generation of advanced intercross lines (AIL) derived from the LG/J and SM/J strains. To increase the QTL detection power, these data were combined with 94 soleus samples from the F2intercross of the same strains. Transcriptome of the soleus muscle of LG/J and SM/J females was analyzed by microarray. Genome-wide association analysis mapped four QTL (genome-wide P < 0.05) affecting the properties of muscle fibers to chromosome 2, 3, 4, and 11. A 1.5-LOD QTL support interval ranged between 2.36 and 4.67 Mb. On the basis of the genomic sequence information and functional and transcriptome data, we identified candidate genes for each of these QTL. The combination of analyses in F2and F34AIL populations with transcriptome and genomic sequence data in the parental strains is an effective strategy for refining QTL and nomination of the candidate genes.


Author(s):  
Martin Steinegger ◽  
Steven L Salzberg

Metagenomic sequencing allows researchers to investigate organisms sampled from their native environments by sequencing their DNA directly, and then quantifying the abundance and taxonomic composition of the organisms thus captured. However, these types of analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here we describe Conterminator, an efficient method to detect and remove incorrectly labelled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination in 114,035 sequences and 2767 species in the NCBI Reference Sequence Database (RefSeq), 2,161,746 sequences and 6795 species in the GenBank database, and 14,132 protein sequences in the NR non-redundant protein database. Conterminator uncovers contamination in sequences spanning the whole range from draft genomes to “complete” model organism genomes. Our method, which scales linearly with input size, was able to process 3.3 terabytes of genomic sequence data in 12 days on a single 32-core compute node. We believe that Conterminator can become an important tool to ensure the quality of reference databases with particular importance for downstream metagenomic analyses. Source code (GPLv3): https://github.com/martin-steinegger/conterminator


Author(s):  
Atal Saha ◽  
Anastasia Andersson ◽  
Sara Kurland ◽  
Naomi Keehnen ◽  
Verena Esther Kutschera ◽  
...  

The sympatric existence of genetically distinct populations of the same species remains a puzzle in ecology. Coexisting salmonid fish populations are known from over 100 freshwater lakes. Most studies of sympatric populations have used limited numbers of genetic markers making it unclear if genetic divergence involves only certain parts of the genome. We return to the first reported case of salmonid sympatry, initially detected through contrasting homozygosity at a single allozyme locus (lactate dehydrogenase, LDH-A1) in brown trout in the small Lakes Bunnersjöarna, central Sweden. We use DNA from samples collected in the 1970s and a 96 SNP fluidigm array to verify the existence of the coexisting demes. We then apply whole-genome resequencing of pooled DNA to explore genome-wide diversity within and between these demes; strong genetic divergence is observed with genome-wide FST=0.13. Nucleotide diversity is estimated to 0.0013 in Deme I but only 0.0005 in Deme II. Individual whole-genome resequencing of two individuals per deme suggests considerably higher inbreeding in Deme II vs. Deme I. Comparing with similar data from other lakes we find that the genome-wide divergence between the demes is similar to that between reproductively isolated populations. We located two genes for LDH-A and found divergence between the demes in a regulatory section of one of the genes, but we could not find a perfect fit between allozyme and sequence data. Our data demonstrate genome-wide divergence governed by genetic drift and diversifying selection, confirming reproductive isolation between the sympatric demes.


Sign in / Sign up

Export Citation Format

Share Document