genomic region
Recently Published Documents





Viruses ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 146
Angelo Pavesi ◽  
Fabio Romerio

Gene overprinting occurs when point mutations within a genomic region with an existing coding sequence create a new one in another reading frame. This process is quite frequent in viral genomes either to maximize the amount of information that they encode or in response to strong selective pressure. The most frequent scenario involves two different reading frames in the same DNA strand (sense overlap). Much less frequent are cases of overlapping genes that are encoded on opposite DNA strands (antisense overlap). One such example is the antisense ORF, asp in the minus strand of the HIV-1 genome overlapping the env gene. The asp gene is highly conserved in pandemic HIV-1 strains of group M, and it is absent in non-pandemic HIV-1 groups, HIV-2, and lentiviruses infecting non-human primates, suggesting that the ~190-amino acid protein that is expressed from this gene (ASP) may play a role in virus spread. While the function of ASP in the virus life cycle remains to be elucidated, mounting evidence from several research groups indicates that ASP is expressed in vivo. There are two alternative hypotheses that could be envisioned to explain the origin of the asp ORF. On one hand, asp may have originally been present in the ancestor of contemporary lentiviruses, and subsequently lost in all descendants except for most HIV-1 strains of group M due to selective advantage. Alternatively, the asp ORF may have originated very recently with the emergence of group M HIV-1 strains from SIVcpz. Here, we used a combination of computational and statistical approaches to study the genomic region of env in primate lentiviruses to shed light on the origin, structure, and sequence evolution of the asp ORF. The results emerging from our studies support the hypothesis of a recent de novo addition of the antisense ORF to the HIV-1 genome through a process that entailed progressive removal of existing internal stop codons from SIV strains to HIV-1 strains of group M, and fine tuning of the codon sequence in env that reduced the chances of new stop codons occurring in asp. Altogether, the study supports the notion that the HIV-1 asp gene encodes an accessory protein, providing a selective advantage to the virus.

2022 ◽  
Vol 23 (1) ◽  
Andrea Hita ◽  
Gilles Brocart ◽  
Ana Fernandez ◽  
Marc Rehmsmeier ◽  
Anna Alemany ◽  

Abstract Background Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. Results Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. Conclusions MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at

2022 ◽  
Vol 12 (1) ◽  
Shin-ya Nishio ◽  
Shin-ichi Usami

AbstractThe STRC gene, located on chromosome 15q15.3, is one of the genetic causes of autosomal recessive mild-to-moderate sensorineural hearing loss. One of the unique characteristics of STRC-associated hearing loss is the high prevalence of long deletions or copy number variations observed on chromosome 15q15.3. Further, the deletion of chromosome 15q15.3 from STRC to CATSPER2 is also known to be a genetic cause of deafness infertility syndrome (DIS), which is associated with not only hearing loss but also male infertility, as CATSPER2 plays crucial roles in sperm motility. Thus, information regarding the deletion range for each patient is important to the provision of appropriate genetic counselling for hearing loss and male infertility. In the present study, we performed next-generation sequencing (NGS) analysis for 9956 Japanese hearing loss patients and analyzed copy number variations in the STRC gene based on NGS read depth data. In addition, we performed Multiplex Ligation-dependent Probe Amplification analysis to determine the deletion range including the PPIP5K1, CKMT1B, STRC and CATSPER2 genomic region to estimate the prevalence of the STRC-CATSPER deletion, which is causative for DIS among the STRC-associated hearing loss patients. As a result, we identified 276 cases with STRC-associated hearing loss. The prevalence of STRC-associated hearing loss in Japanese hearing loss patients was 2.77% (276/9956). In addition, 77.1% of cases with STRC homozygous deletions carried a two copy loss of the entire CKMT1B-STRC-CATSPER2 gene region. This information will be useful for the provision of more appropriate genetic counselling regarding hearing loss and male infertility for the patients with a STRC deletion.

2022 ◽  
Vol 12 (1) ◽  
Vera Belova ◽  
Anna Pavlova ◽  
Robert Afasizhev ◽  
Viktoriya Moskalenko ◽  
Margarita Korzhanova ◽  

AbstractHuman exome sequencing is a classical method used in most medical genetic applications. The leaders in the field are the manufacturers of enrichment kits based on hybridization of cRNA or cDNA biotinylated probes specific for a genomic region of interest. Recently, the platforms manufactured by the Chinese company MGI Tech have become widespread in Europe and Asia. The reliability and quality of the obtained data are already beyond any doubt. However, only a few kits compatible with these sequencers can be used for such specific tasks as exome sequencing. We developed our own solution for library pre-capture pooling and exome enrichment with Agilent probes. In this work, using a set of the standard benchmark samples from the Platinum Genome collection, we demonstrate that the qualitative and quantitative parameters of our protocol which we called “RSMU_exome” exceed those of the MGI Tech kit. Our protocol allows for identifying more SNV and indels, generates fewer PCR duplicates, enables pooling of more samples in a single enrichment procedure, and requires less raw data to obtain results comparable with the MGI Tech's protocol. The cost of our protocol is also lower than that of MGI Tech's solution.

2022 ◽  
Albano Pinto ◽  
Catarina Cunha ◽  
Raquel Chaves ◽  
Matthew ER Butchbach ◽  
Filomena Adega

Abstract Transposable elements (TEs) are interspersed repetitive DNA sequences with the ability to mobilize in the genome. The recent development of improved tools for evaluating TE-derived sequences in genomic studies has enabled an increasing attention to the contribution of TEs to human development and disease. Spinal muscular atrophy (SMA) is an autosomal recessive motor neuron disease that is caused by deletions or mutations in the Survival Motor Neuron 1 (SMN1) gene. SMN2 gene is a nearly perfect duplication of SMN1. Both genes (collectively known as SMN1/SMN2) are highly enriched in TEs. A comprehensive analysis of TEs insertions in the SMN1/2 loci of SMA carriers, patients and healthy/control individuals was completed to perceive TE dynamics in SMN1/2 and try to establish a link between these elements and SMA.We found an Alu insertion in the promoter region and one L1 element in the 3’UTR that likely play an important role as an alternative promoter and as an alternative terminator to the gene, respectively. Additionally, the several Alu repeats inserted in the genes’ introns influence splicing, giving rise to alternative splicing events that cause RNA circularization and the birth of new alternative exons. These Alu repeats present throughout the genes are also prone to recombination events that can lead to SMN1 exons deletions, that ultimately lead to SMA. The many good and bad implications associated with the presence of TEs inside SMN1/2 make this genomic region ideal for understanding the implications of TEs on genomic evolution as well as on human genomic disease.

Daria Martchenko ◽  
Aaron Shafer

Genomic approaches to the study of population demography rely on accurate SNP calling and by-proxy the site frequency spectrum (SFS). Two main questions for the design of such studies remain poorly investigated: do reduced genomic sequencing summary statistics reflect that of whole genome, and how do sequencing strategies and derived summary statistics impact demographic inferences? To address those questions, we applied the ddRAD sequencing approach to 254 individuals and whole genome resequencing approach to 35 mountain goat (Oreamnos americanus) individuals across the species range with a known demographic history. We identified SNPs with 5 different variant callers and used ANGSD to estimate the genotype likelihoods (GLs). We tested combinations of SNP filtering by linkage disequilibrium (LD), minor allele frequency (MAF) and the genomic region. We compared the resulting suite of summary statistics reflective of the SFS and quantified the relationship to demographic inferences by estimating the contemporary effective population size (Ne), isolation-by-distance and population structure, FST, and explicit modelling of the demographic history with δaδi. Filtering had a larger effect than sequencing strategy, with the former strongly influencing summary statistics. Estimates of contemporary Ne and isolation-by-distance patterns were largely robust to the choice of sequencing, pipeline, and filtering. Despite the high variance in summary statistics, whole genome and reduced representation approaches were overall similar in supporting a glacial induced vicariance and low Ne in mountain goats. We discuss why whole genome resequencing data is preferable, and reiterate support the use of GLs, in part because it limits user-determined filters.

2022 ◽  
Vol 12 (1) ◽  
Walter W. Wolfsberger ◽  
Nikole M. Ayala ◽  
Stephanie O. Castro-Marquez ◽  
Valerie M. Irizarry-Negron ◽  
Antoliy Potapchuk ◽  

AbstractSince the first Spanish settlers brought horses to America centuries ago, several local varieties and breeds have been established in the New World. These were generally a consequence of the admixture of the different breeds arriving from Europe. In some instances, local horses have been selectively bred for specific traits, such as appearance, endurance, strength, and gait. We looked at the genetics of two breeds, the Puerto Rican Non-Purebred (PRNPB) (also known as the “Criollo”) horses and the Puerto Rican Paso Fino (PRPF), from the Caribbean Island of Puerto Rico. While it is reasonable to assume that there was a historic connection between the two, the genetic link between them has never been established. In our study, we started by looking at the genetic ancestry and diversity of current Puerto Rican horse populations using a 668 bp fragment of the mitochondrial DNA D-loop (HVR1) in 200 horses from 27 locations on the island. We then genotyped all 200 horses in our sample for the “gait-keeper” DMRT3 mutant allele previously associated with the paso gait especially cherished in this island breed. We also genotyped a subset of 24 samples with the Illumina Neogen Equine Community genome-wide array (65,000 SNPs). This data was further combined with the publicly available PRPF genomes from other studies. Our analysis show an undeniable genetic connection between the two varieties in Puerto Rico, consistent with the hypothesis that PRNPB horses represent the descendants of the original genetic pool, a mix of horses imported from the Iberian Peninsula and elsewhere in Europe. Some of the original founders of PRNRB population must have carried the “gait-keeper” DMRT3 allele upon arrival to the island. From this admixture, the desired traits were selected by the local people over the span of centuries. We propose that the frequency of the mutant “gait-keeper” allele originally increased in the local horses due to the selection for the smooth ride and other characters, long before the PRPF breed was established. To support this hypothesis, we demonstrate that PRNPB horses, and not the purebred PRPF, carry a signature of selection in the genomic region containing the DMRT3 locus to this day. The lack of the detectable signature of selection associated with the DMRT3 in the PRPF would be expected if this native breed was originally derived from the genetic pool of PRNPB horses established earlier and most of the founders already had the mutant allele. Consequently, selection specific to PRPF later focused on allels in other genes (including CHRM5, CYP2E1, MYH7, SRSF1, PAM, PRN and others) that have not been previously associated with the prized paso gait phenotype in Puerto Rico or anywhere else.

Paul Cheng ◽  
Robert C. Wirka ◽  
Lee Shoa Clarke ◽  
Quanyi Zhao ◽  
Ramendra Kundu ◽  

Background: Smooth muscle cells (SMC) transition into a number of different phenotypes during atherosclerosis, including those that resemble fibroblasts and chondrocytes, and make up the majority of cells in the atherosclerotic plaque. To better understand the epigenetic and transcriptional mechanisms that mediate these cell state changes, and how they relate to risk for coronary artery disease (CAD), we have investigated the causality and function of transcription factors (TFs) at genome wide associated loci. Methods: We employed CRISPR-Cas 9 genome and epigenome editing to identify the causal gene and cell(s) for a complex CAD GWAS signal at 2q22.3. Subsequently, single-cell epigenetic and transcriptomic profiling in murine models and human coronary artery smooth muscle cells were employed to understand the cellular and molecular mechanism by which this CAD risk gene exerts its function. Results: CRISPR-Cas 9 genome and epigenome editing showed that the complex CAD genetic signals within a genomic region at 2q22.3 lie within smooth muscle long-distance enhancers for ZEB2 , a TF extensively studied in the context of epithelial mesenchymal transition (EMT) in development and cancer. ZEB2 regulates SMC phenotypic transition through chromatin remodeling that obviates accessibility and disrupts both Notch and TGFβ signaling, thus altering the epigenetic trajectory of SMC transitions. SMC specific loss of ZEB2 resulted in an inability of transitioning SMCs to turn off contractile programing and take on a fibroblast-like phenotype, but accelerated the formation of chondromyocytes, mirroring features of high-risk atherosclerotic plaques in human coronary arteries. Conclusions: These studies identify ZEB2 as a new CAD GWAS gene that affects features of plaque vulnerability through direct effects on the epigenome, providing a new thereapeutic approach to target vascular disease.

2022 ◽  
Paula Silva ◽  
Byron Evers ◽  
Alexandria Kieffaber ◽  
Xu Wang ◽  
Richard Brown ◽  

Barley yellow dwarf (BYD) is one of the major viral diseases of cereals. Phenotyping BYD in wheat is extremely challenging due to similarities to other biotic and abiotic stresses. Breeding for resistance is additionally challenging as the wheat primary germplasm pool lacks genetic resistance, with most of the few resistance genes named to date originating from a wild relative species. The objectives of this study were to, i) evaluate the use of high-throughput phenotyping (HTP) from unmanned aerial systems to improve BYD assessment and selection, ii) identify genomic regions associated with BYD resistance, and iii) evaluate genomic prediction models ability to predict BYD resistance. Up to 107 wheat lines were phenotyped during each of five field seasons under both insecticide treated and untreated plots. Across all seasons, BYD severity was lower with the insecticide treatment and plant height (PTHTM) and grain yield (GY) showed increased values relative to untreated entries. Only 9.2% of the lines were positive for the presence of the translocated segment carrying resistance gene Bdv2 on chromosome 7DL. Despite the low frequency, this region was identified through association mapping. Furthermore, we mapped a potentially novel genomic region for resistance on chromosome 5AS. Given the variable heritability of the trait (0.211 0.806), we obtained relatively good predictive ability for BYD severity ranging between 0.06 0.26. Including Bdv2 on the predictive model had a large effect for predicting BYD but almost no effect for PTHTM and GY. This study was the first attempt to characterize BYD using field-HTP and apply GS to predict the disease severity. These methods have the potential to improve BYD characterization and identifying new sources of resistance will be crucial for delivering BYD resistant germplasm.

Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 70
Juying Han ◽  
Brian Ritchey ◽  
Emmanuel Opoku ◽  
Jonathan D. Smith

A mouse strain intercross between Apoe−/− AKR/J and DBA/2J mice identified three replicated atherosclerosis quantitative trait loci (QTLs). Our objective was to fine map mouse atherosclerosis modifier genes within a genomic region known to affect lesion development in apoE-deficient (Apoe−/−) mice. We dissected the Ath28 QTL on the distal end of chromosome 2 by breeding a panel of congenic strains and measuring aortic root lesion area in 16-week-old male and female mice fed regular laboratory diets. The parental congenic strain contained ~9.65 Mb of AKR/J DNA from chromosome 2 on the DBA/2J genetic background, which had lesions 55% and 47% smaller than female and male DBA/2J mice, respectively (p < 0.001). Seven additional congenic lines identified three separate regions associated with the lesion area, named Ath28.1, Ath28.2, and Ath28.3, where the AKR/J alleles were atherosclerosis-protective for two regions and atherosclerosis-promoting for the other region. These results were replicated in both sexes, and in combined analysis after adjusting for sex. The congenic lines did not greatly impact total and HDL cholesterol levels or body weight. Bioinformatic analyses identified all coding and non-coding genes in the Ath28.1 sub-region, as well as strain sequence differences that may be impactful. Even within a <10 Mb region of the mouse genome, evidence supports the presence of at least three atherosclerosis modifier genes that differ between the AKR/J and DBA/2J mouse strains, supporting the polygenic nature of atherosclerosis susceptibility.

Sign in / Sign up

Export Citation Format

Share Document