scholarly journals Straintables: An application that extracts sequences from genome assemblies and generates dissimilarity matrices

2021 ◽  
Author(s):  
Gabriel Araujo ◽  
Richard Francis ◽  
Cristina Ferreira ◽  
Alba Rangel

Background and Objectives: The dissimilarity matrix (DM) is an important component of phylogenetic analysis, and many software packages exist to build and show DMs. However, as the common input for this type of software are sequences in FASTA file format, the process of extracting and aligning each set of sequences to produce a big number of matrices can be laborious. Additionally, existing software does not facilitate the comparison of clusters of similarity across several DMs built for the same group of individuals, using different genomic regions. To address our requirements of such a tool, we designed Straintables to extract specific genomic region sequences from a group of intraspecies genomic assemblies, using extracted sequences to build dissimilarity matrices. Methods: A Python module with executable scripts was developed for a study on genetic diversity across strains of Toxoplasma gondii, being a general purpose system for DM calculation and visualization for preliminary phylogenetic studies. For automatic region sequence extraction from genomic assemblies we assembled a system that designs virtual primers using reference sequences located at genomic annotations, then matches those primers on genome files by using regex patterns. Extracted sequences are then aligned using Clustal Omega and compared to generate matrices. Results: Using this software saves the user from manual preparation and alignment of the sequences, a process that can be laborious when a large number of assemblies or regions are involved. The automatic sequence extraction process can be checked against BLAST results using the extracted sequence as queries, where correct results were observed for same-species pools for various organisms. The package also contains a matrix visualization tool focused on cluster visualization, capable of drawing matrices into image files with custom settings, and features methods of reordering matrices to facilitate the comparison of clustering patterns across two or more matrices. Conclusion: Straintables may replace and extend the functionality of existing matrix-oriented phylogenetic software, featuring automatic region extraction from genomic assemblies and enhanced matrix visualization capabilities emphasizing cluster identification. This module is open source, available at GitHub (https://github.com/Gab0/straintables) under a MIT license and also as a PIPY package.

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Julen Mendieta-Esteban ◽  
Marco Di Stefano ◽  
David Castillo ◽  
Irene Farabella ◽  
Marc A Marti-Renom

Abstract Chromosome conformation capture (3C) technologies measure the interaction frequency between pairs of chromatin regions within the nucleus in a cell or a population of cells. Some of these 3C technologies retrieve interactions involving non-contiguous sets of loci, resulting in sparse interaction matrices. One of such 3C technologies is Promoter Capture Hi-C (pcHi-C) that is tailored to probe only interactions involving gene promoters. As such, pcHi-C provides sparse interaction matrices that are suitable to characterize short- and long-range enhancer–promoter interactions. Here, we introduce a new method to reconstruct the chromatin structural (3D) organization from sparse 3C-based datasets such as pcHi-C. Our method allows for data normalization, detection of significant interactions and reconstruction of the full 3D organization of the genomic region despite of the data sparseness. Specifically, it builds, with as low as the 2–3% of the data from the matrix, reliable 3D models of similar accuracy of those based on dense interaction matrices. Furthermore, the method is sensitive enough to detect cell-type-specific 3D organizational features such as the formation of different networks of active gene communities.


2012 ◽  
Vol 78 (7) ◽  
pp. 2435-2442 ◽  
Author(s):  
Marie Foulongne-Oriol ◽  
Anne Rodier ◽  
Jean-Michel Savoie

ABSTRACTDry bubble, caused byLecanicillium fungicola, is one of the most detrimental diseases affecting button mushroom cultivation. In a previous study, we demonstrated that breeding for resistance to this pathogen is quite challenging due to its quantitative inheritance. A second-generation hybrid progeny derived from an intervarietal cross between a wild strain and a commercial cultivar was characterized forL. fungicolaresistance under artificial inoculation in three independent experiments. Analysis of quantitative trait loci (QTL) was used to determine the locations, numbers, and effects of genomic regions associated with dry-bubble resistance. Four traits related to resistance were analyzed. Two to four QTL were detected per trait, depending on the experiment. Two genomic regions, on linkage group X (LGX) and LGVIII, were consistently detected in the three experiments. The genomic region on LGX was detected for three of the four variables studied. The total phenotypic variance accounted for by all QTL ranged from 19.3% to 42.1% over all traits in all experiments. For most of the QTL, the favorable allele for resistance came from the wild parent, but for some QTL, the allele that contributed to a higher level of resistance was carried by the cultivar. Comparative mapping with QTL for yield-related traits revealed five colocations between resistance and yield component loci, suggesting that the resistance results from both genetic factors and fitness expression. The consequences for mushroom breeding programs are discussed.


2021 ◽  
Vol 30 (1) ◽  
pp. 95-103
Author(s):  
Mohammad Shamimul Alam ◽  
Israt Jahan ◽  
Sadniman Rahman ◽  
Hawa Jahan ◽  
Kaniz Fatema

Tilapia is a hardy fish which can survive in water bodies polluted with heavy metals. Metal resistance is conferred by higher expression of metallothionein gene (mt) in many organisms. Level, time and tissue-specificity of gene expression is regulated through transcription factor binding sites (TFBS) which may be present in the upstream, downstream, or even in the introns of a gene. So, as a candidate regulatory region, the 5’upstream sequence of mt gene in three tilapia species, Oreochromis aureus, O. niloticus and O. mossambicus was studied. The targeted region was PCR-amplified and then sequenced using a pair of custom-designed primer. A total of only 2.7% variation was found in the sequenced genomic region among the three species. Metal-related TFBS were predicted from these sequences. A total of twenty eight TFBS were found in O. aureus and twenty nine in O. mossambicus and O. niloticus. The number of metalrelated TFBS predicted in the targeted sequence was significantly higher compared to that found in randomly selected other genomic regions of same size from O. niloticus genome. Thus, the results suggest the presence of putative regulatory elements in the targeted upstream region which might have important role in the regulation of mt gene function. Dhaka Univ. J. Biol. Sci. 30(1): 95-103, 2021 (January)


Genetics ◽  
1994 ◽  
Vol 137 (4) ◽  
pp. 987-997 ◽  
Author(s):  
S G Clark ◽  
X Lu ◽  
H R Horvitz

Abstract The Caenorhabditis elegans locus lin-15 negatively regulates an intercellular signaling process that induces formation of the hermaphrodite vulva. The lin-15 locus controls two separate genetic activities. Mutants that lack both activities have multiple, ectopic pseudo-vulvae resulting from the overproduction of vulval cells, whereas mutants defective in only one lin-15 activity appear wild-type. lin-15 acts non-cell-autonomously to prevent the activation of a receptor tyrosine kinase/ras signaling pathway. We report here the molecular characterization of the lin-15 locus. The two lin-15 activities are encoded by contiguous genomic regions and by two distinct, non-overlapping transcripts that may be processed from a single mRNA precursor by trans-splicing. Based on the DNA sequence, the 719- and 1,440-amino acid lin-15 proteins are not similar to each other or to known proteins. lin-15 multivulva mutants, which are defective in both lin-15 activities, contain deletions and insertions that affect the lin-15 genomic region.


Development ◽  
1999 ◽  
Vol 126 (3) ◽  
pp. 577-586 ◽  
Author(s):  
H. Muller ◽  
R. Samanta ◽  
E. Wieschaus

Wingless signaling plays a central role during epidermal patterning in Drosophila. We have analyzed zygotic requirements for Wingless signaling in the embryonic ectoderm by generating synthetic deficiencies that uncover more than 99% of the genome. We found no genes required for initial wingless expression, other than previously identified segmentation genes. In contrast, maintenance of wingless expression shows a high degree of zygotic transcriptional requirements. Besides known genes, we have identified at least two additional genomic regions containing new genes involved in Wingless maintenance. We also assayed for the zygotic requirements for Wingless response and found that no single genomic region was required for the cytoplasmic accumulation of Armadillo in the receiving cells. Surprisingly, embryos homozygously deleted for the candidate Wingless receptor, Dfrizzled2, showed a normal Wingless response. However, the Armadillo response to Wingless was strongly reduced in double mutants of both known members of the frizzled family in Drosophila, frizzled and Dfrizzled2. Based on their expression pattern during embryogenesis, different Frizzled receptors may play unique but overlapping roles in development. In particular, we suggest that Frizzled and Dfrizzled2 are both required for Wingless autoregulation, but might be dispensable for late Engrailed maintenance. While Wingless signaling in embryos mutant for frizzled and Dfrizzled2 is affected, Wingless protein is still internalized into cells adjacent to wingless-expressing cells. Incorporation of Wingless protein may therefore involve cell surface molecules in addition to the genetically defined signaling receptors of the frizzled family.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 1026 ◽  
Author(s):  
K. N. S. Usha Kiranmayee ◽  
C. Tom Hash ◽  
S. Sivasubramani ◽  
P. Ramu ◽  
Bhanu Prakash Amindala ◽  
...  

This study was conducted to dissect the genetic basis and to explore the candidate genes underlying one of the important genomic regions on an SBI-10 long arm (L), governing the complex stay-green trait contributing to post-flowering drought-tolerance in sorghum. A fine-mapping population was developed from an introgression line cross—RSG04008-6 (stay-green) × J2614-11 (moderately senescent). The fine-mapping population with 1894 F2 was genotyped with eight SSRs and a set of 152 recombinants was identified, advanced to the F4 generation, field evaluated with three replications over 2 seasons, and genotyped with the GBS approach. A high-resolution linkage map was developed for SBI-10L using 260 genotyping by sequencing—Single Nucleotide Polymorphism (GBS–SNPs). Using the best linear unpredicted means (BLUPs) of the percent green leaf area (%GL) traits and the GBS-based SNPs, we identified seven quantitative trait loci (QTL) clusters and single gene, mostly involved in drought-tolerance, for each QTL cluster, viz., AP2/ERF transcription factor family (Sobic.010G202700), NBS-LRR protein (Sobic.010G205600), ankyrin-repeat protein (Sobic.010G205800), senescence-associated protein (Sobic.010G270300), WD40 (Sobic.010G205900), CPK1 adapter protein (Sobic.010G264400), LEA2 protein (Sobic.010G259200) and an expressed protein (Sobic.010G201100). The target genomic region was thus delimited from 15 Mb to 8 genes co-localized with QTL clusters, and validated using quantitative real-time (qRT)–PCR.


2014 ◽  
Vol 281 (1783) ◽  
pp. 20140012 ◽  
Author(s):  
Devon E. Pearse ◽  
Michael R. Miller ◽  
Alicia Abadía-Cardoso ◽  
John Carlos Garza

Rapid adaptation to novel environments may drive changes in genomic regions through natural selection. Such changes may be population-specific or, alternatively, may involve parallel evolution of the same genomic region in multiple populations, if that region contains genes or co-adapted gene complexes affecting the selected trait(s). Both quantitative and population genetic approaches have identified associations between specific genomic regions and the anadromous (steelhead) and resident (rainbow trout) life-history strategies of Oncorhynchus mykiss . Here, we use genotype data from 95 single nucleotide polymorphisms and show that the distribution of variation in a large region of one chromosome, Omy5, is strongly associated with life-history differentiation in multiple above-barrier populations of rainbow trout and their anadromous steelhead ancestors. The associated loci are in strong linkage disequilibrium, suggesting the presence of a chromosomal inversion or other rearrangement limiting recombination. These results provide the first evidence of a common genomic basis for life-history variation in O. mykiss in a geographically diverse set of populations and extend our knowledge of the heritable basis of rapid adaptation of complex traits in novel habitats.


2019 ◽  
Vol 37 (5) ◽  
pp. 1407-1419 ◽  
Author(s):  
Markus G Stetter ◽  
Mireia Vidal-Villarejo ◽  
Karl J Schmid

Abstract Thousands of plants have been selected as crops; yet, only a few are fully domesticated. The lack of adaptation to agroecological environments of many crop plants with few characteristic domestication traits potentially has genetic causes. Here, we investigate the incomplete domestication of an ancient grain from the Americas, amaranth. Although three grain amaranth species have been cultivated as crop for millennia, all three lack key domestication traits. We sequenced 121 crop and wild individuals to investigate the genomic signature of repeated incomplete adaptation. Our analysis shows that grain amaranth has been domesticated three times from a single wild ancestor. One trait that has been selected during domestication in all three grain species is the seed color, which changed from dark seeds to white seeds. We were able to map the genetic control of the seed color adaptation to two genomic regions on chromosomes 3 and 9, employing three independent mapping populations. Within the locus on chromosome 9, we identify an MYB-like transcription factor gene, a known regulator for seed color variation in other plant species. We identify a soft selective sweep in this genomic region in one of the crop species but not in the other two species. The demographic analysis of wild and domesticated amaranths revealed a population bottleneck predating the domestication of grain amaranth. Our results indicate that a reduced level of ancestral genetic variation did not prevent the selection of traits with a simple genetic architecture but may have limited the adaptation of complex domestication traits.


2008 ◽  
Vol 76 (10) ◽  
pp. 4581-4591 ◽  
Author(s):  
Tal Zusman ◽  
Elena Degtyar ◽  
Gil Segal

ABSTRACT Legionella pneumophila is an intracellular pathogen that has been shown to utilize the Icm/Dot type IV secretion system for pathogenesis. This system was shown to be composed of Icm/Dot complex components, accessory proteins, and a large number of translocated substrates. In this study, comparison of the icmQ regulatory regions from many Legionella species revealed a conserved regulatory sequence that includes the icmQ −10 promoter element. Mutagenesis of this conserved regulatory element indicated that each of the nucleotides in it affects the level of expression of the icmQ gene but not in a uniform fashion. A genomic analysis discovered that four additional genes in L. pneumophila contain this conserved regulatory sequence, which was found to function similarly in these genes as well. Examination of these four genes indicated that they are dispensable for intracellular growth, but two of them were found to encode new Icm/Dot translocated substrates (IDTS). Comparison of the genomic regions encoding these two IDTS among the four available L. pneumophila genomic sequences indicated that one of these genes is located in a hypervariable genomic region, which was shown before to contain an IDTS-encoding gene. Translocation analysis that was performed for nine proteins encoded from this hypervariable genomic region indicated that six of them are new IDTS which are translocated into host cells in an Icm/Dot-dependent manner. Furthermore, a bioinformatic analysis indicated that additional L. pneumophila genomic regions that contain several neighboring IDTS-encoding genes are hypervariable in gene content.


Author(s):  
Duchwan Ryu ◽  
Hongyan Xu ◽  
Varghese George ◽  
Shaoyong Su ◽  
Xiaoling Wang ◽  
...  

AbstractDifferential methylation of regulatory elements is critical in epigenetic researches and can be statistically tested. We developed a new statistical test, the generalized integrated functional test (GIFT), that tests for regional differences in methylation based on the methylation percent at each CpG site within a genomic region. The GIFT uses estimated subject-specific profiles with smoothing methods, specifically wavelet smoothing, and calculates an ANOVA-like test to compare the average profile of groups. In this way, possibly correlated CpG sites within the regulatory region are compared all together. Simulations and analyses of data obtained from patients with chronic lymphocytic leukemia indicate that GIFT has good statistical properties and is able to identify promising genomic regions. Further, GIFT is likely to work with multiple different types of experiments since different smoothing methods can be used to estimate the profiles of data without noise. Matlab code for GIFT and sample data are available at


Sign in / Sign up

Export Citation Format

Share Document