open reading frame length
Recently Published Documents

Abstract Background Pseudogenes are non-functional copies of protein coding genes that typically follow a different molecular evolutionary path as compared to functional genes. The inclusion of pseudogene sequences in DNA barcoding and metabarcoding analysis can lead to misleading results. None of the most widely used bioinformatic pipelines used to process marker gene (metabarcode) high throughput sequencing data specifically accounts for the presence of pseudogenes in protein-coding marker genes. The purpose of this study is to develop a method to screen for nuclear mitochondrial DNA segments (nuMTs) in large COI datasets. We do this by: (1) describing gene and nuMT characteristics from an artificial COI barcode dataset, (2) show the impact of two different pseudogene removal methods on perturbed community datasets with simulated nuMTs, and (3) incorporate a pseudogene filtering step in a bioinformatic pipeline that can be used to process Illumina paired-end COI metabarcode sequences. Open reading frame length and sequence bit scores from hidden Markov model (HMM) profile analysis were used to detect pseudogenes. Results Our simulations showed that it was more difficult to identify nuMTs from shorter amplicon sequences such as those typically used in metabarcoding compared with full length DNA barcodes that are used in the construction of barcode libraries. It was also more difficult to identify nuMTs in datasets where there is a high percentage of nuMTs. Existing bioinformatic pipelines used to process metabarcode sequences already remove some nuMTs, especially in the rare sequence removal step, but the addition of a pseudogene filtering step can remove up to 5% of sequences even when other filtering steps are in place. Conclusions Open reading frame length filtering alone or combined with hidden Markov model profile analysis can be used to effectively screen out apparent pseudogenes from large datasets. There is more to learn from COI nuMTs such as their frequency in DNA barcoding and metabarcoding studies, their taxonomic distribution, and evolution. Thus, we encourage the submission of verified COI nuMTs to public databases to facilitate future studies.

Download Full-text

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets

10.1101/2021.01.24.427982 ◽

2021 ◽

Author(s):

T. M. Porter ◽

M. Hajibabaei

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Dna Barcoding ◽

Hidden Markov ◽

Dna Barcode ◽

Open Reading Frame ◽

Protein Coding ◽

Reading Frame ◽

Frame Length ◽

Open Reading Frame Length

AbstractBackgroundPseudogenes are non-functional copies of protein coding genes that typically follow a different molecular evolutionary path as compared to functional genes. The inclusion of pseudogene sequences in DNA barcoding and metabarcoding analysis can lead to misleading results. None of the most widely used bioinformatic pipelines used to process marker gene (metabarcode) high throughput sequencing data specifically accounts for the presence of pseudogenes in protein-coding marker genes. The purpose of this study is to develop a method to screen for obvious pseudogenes in large COI metabarcode datasets. We do this by: 1) describing gene and pseudogene characteristics from a simulated DNA barcode dataset, 2) show the impact of two different pseudogene removal methods on mock metabarcode datasets with simulated pseudogenes, and 3) incorporate a pseudogene filtering step in a bioinformatic pipeline that can be used to process Illumina paired-end COI metabarcode sequences. Open reading frame length and sequence bit scores from hidden Markov model (HMM) profile were used to detect pseudogenes.ResultsOur simulations showed that it was more difficult to identify pseudogenes from shorter amplicon sequences such as those typically used in metabarcoding (∼300 bp) compared with full length DNA barcodes that are used in construction of barcode libraries (∼ 650 bp). It was also more difficult to identify pseudogenes in datasets where there is a high percentage of pseudogene sequences. We show that existing bioinformatic pipelines used to process metabarcode sequences already remove some apparent pseudogenes, especially in the rare sequence removal step, but the addition of a pseudogene filtering step can remove more.ConclusionsThe combination of open reading frame length and hidden Markov model profile analysis can be used to effectively screen out obvious pseudogenes from large datasets. There is more to learn from COI pseudogenes such as their frequency in DNA barcode and metabarcoding studies, their taxonomic distribution, and evolution. Thus, we encourage the submission of verified COI pseudogenes to public databases to facilitate future studies.

Download Full-text

Characterization of Porcine Endogenous Retrovirus Clones from the NIH Miniature Pig BAC Library

Journal of Biomedicine and Biotechnology ◽

10.1155/2012/482568 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 4

Author(s):

Seong-Lan Yu ◽

Woo-Young Jung ◽

Kie-Chul Jung ◽

In-Cheol Cho ◽

Hyun-Tae Lim ◽

...

Keyword(s):

Bac Library ◽

Endogenous Retrovirus ◽

Endogenous Retroviruses ◽

Miniature Pig ◽

Reading Frame ◽

Porcine Endogenous Retrovirus ◽

Open Reading Frame Length ◽

Porcine Endogenous Retroviruses ◽

Integration Sites

Pigs have been considered as donors for xenotransplantation in the replacement of human organs and tissues. However, porcine endogenous retroviruses (PERVs) might transmit new infectious disease to humans during xenotransplantation. To investigate PERV integration sites, 45 PERV-positive BAC clones, including 12 PERV-A, 16 PERV-B, and 17 PERV-C clones, were identified from the NIH miniature pig BAC library. The analysis of 12 selected full-length sequences of PERVs, including the long terminal repeat (LTR) region, identified the expected of open reading frame length, an indicative of active PERV, in all five PERV-C clones and one of the four PERV-B clones. Premature stop codons were observed in only three PERV-A clones. Also, eleven PERV integration sites were mapped using a 5000-rad IMpRH panel. The map locations of PERV-C clones have not been reported before, thus they are novel PERV clones identified in this study. The results could provide basic information for the elimination of site-specific PERVs in selection of pigs for xenotransplantation.

Download Full-text

Role of mRNA Stability during Genome-wide Adaptation of Lactococcus lactis to Carbon Starvation

Journal of Biological Chemistry ◽

10.1074/jbc.m506006200 ◽

2005 ◽

Vol 280 (43) ◽

pp. 36380-36385 ◽

Cited By ~ 57

Author(s):

Emma Redon ◽

Pascal Loubière ◽

Muriel Cocaign-Bousquet

Keyword(s):

Lactococcus Lactis ◽

Mrna Stability ◽

Formal Method ◽

Growth Conditions ◽

Carbon Starvation ◽

Reading Frame ◽

Mrna Pool ◽

Open Reading Frame Length ◽

Wide Range ◽

The Stability

The stability of mRNA was investigated for the first time at the genomic scale during carbon starvation adaptation of Lactococcus lactis IL1403. In exponential phase, mRNA half-lives were correlated positively to open reading frame length. A polypurine sequence, AGGAG, was identified as a putative 5′-stabilizer and inverted repeated sequences as a 3′-destabilizer. These original findings suggested that multiple pathways of mRNA degradation should coexist: internal cleavage, endonuclease cleavage initiated at the 5′-end, and exonuclease attack at the 3′-end. During carbon starvation adaptation, mRNA stability globally increased, but specific mechanisms allowing a wide range of stabilization factors between genes and differential kinetic evolution were involved. A formal method allowing the quantification of the relative influences of transcription and degradation on the mRNA pool control was developed and applied in L. lactis. Gene expression was mostly controlled by altered transcription prior to carbon source exhaustion, while the influence of mRNA stability increased during the starvation phase. This study highlighted that stability modulation in response to adverse growth conditions can govern gene regulation to the same extent as transcription in bacteria.

Download Full-text

open reading frame lengthRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets

Characterization of Porcine Endogenous Retrovirus Clones from the NIH Miniature Pig BAC Library

Role of mRNA Stability during Genome-wide Adaptation of Lactococcus lactis to Carbon Starvation

open reading frame length
Recently Published Documents