gene prediction
Recently Published Documents


TOTAL DOCUMENTS

643
(FIVE YEARS 280)

H-INDEX

47
(FIVE YEARS 8)

2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Manyun Guo ◽  
Yucheng Ma ◽  
Wanyuan Liu ◽  
Zuyi Yuan

AbstractNucleocapsid protein (NC) in the group-specific antigen (gag) of retrovirus is essential in the interactions of most retroviral gag proteins with RNAs. Computational method to predict NCs would benefit subsequent structure analysis and functional study on them. However, no computational method to predict the exact locations of NCs in retroviruses has been proposed yet. The wide range of length variation of NCs also increases the difficulties. In this paper, a computational method to identify NCs in retroviruses is proposed. All available retrovirus sequences with NC annotations were collected from NCBI. Models based on random forest (RF) and weighted support vector machine (WSVM) were built to predict initiation and termination sites of NCs. Factor analysis scales of generalized amino acid information along with position weight matrix were utilized to generate the feature space. Homology based gene prediction methods were also compared and integrated to bring out better predicting performance. Candidate initiation and termination sites predicted were then combined and screened according to their intervals, decision values and alignment scores. All available gag sequences without NC annotations were scanned with the model to detect putative NCs. Geometric means of sensitivity and specificity generated from prediction of initiation and termination sites under fivefold cross-validation are 0.9900 and 0.9548 respectively. 90.91% of all the collected retrovirus sequences with NC annotations could be predicted totally correct by the model combining WSVM, RF and simple alignment. The composite model performs better than the simplex ones. 235 putative NCs in unannotated gags were detected by the model. Our prediction method performs well on NC recognition and could also be expanded to solve other gene prediction problems, especially those whose training samples have large length variations.


2022 ◽  
Author(s):  
Marwa Helmy ◽  
Eman Eldaydamony ◽  
Nagham Mekky ◽  
Mohammed Elmogy ◽  
Hassan Soliman

Abstract Identifying genes related to Parkinson's disease (PD) is an active and effective research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. In recent years, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long non-coding (lncRNA) genes, which play an essential role in biological processes and the Transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the UCSC genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using five numerical mapping techniques with Fourier transform and PyFeat method with Adaboost technique as feature selection. Finally, the features were fed to the gradient boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics are used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy (ACC) equals 78.1%, the area under the curve (AUC) equals 84.9%, the area under precision-recall (AUPR) equals 85.0%, F1-score equals 78.2%, Matthews correlation coefficient (MCC) equals 0.564, Sensitivity (SEN) equals 79.1%, and specificity (SPC) equals 77.1%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.


BIOCELL ◽  
2022 ◽  
Vol 46 (4) ◽  
pp. 941-949
Author(s):  
HUI-YING JIN ◽  
MAO-SONG PEI ◽  
DA-LONG GUO

GigaScience ◽  
2022 ◽  
Vol 11 (1) ◽  
Author(s):  
Youngik Yang ◽  
Ji Yong Yoo ◽  
Sang Ho Baek ◽  
Ha Yeun Song ◽  
Seonmi Jo ◽  
...  

Abstract Background The shuttles hoppfish (mudskipper), Periophthalmus modestus, is one of the mudskippers, which are the largest group of amphibious teleost fishes, which are uniquely adapted to live on mudflats. Because mudskippers can survive on land for extended periods by breathing through their skin and through the lining of the mouth and throat, they were evaluated as a model for the evolutionary sea-land transition of Devonian protoamphibians, ancestors of all present tetrapods. Results A total of 39.6, 80.2, 52.9, and 33.3 Gb of Illumina, Pacific Biosciences, 10X linked, and Hi-C data, respectively, was assembled into 1,419 scaffolds with an N50 length of 33 Mb and BUSCO score of 96.6%. The assembly covered 117% of the estimated genome size (729 Mb) and included 23 pseudo-chromosomes anchored by a Hi-C contact map, which corresponded to the top 23 longest scaffolds above 20 Mb and close to the estimated one. Of the genome, 43.8% were various repetitive elements such as DNAs, tandem repeats, long interspersed nuclear elements, and simple repeats. Ab initio and homology-based gene prediction identified 30,505 genes, of which 94% had homology to the 14 Actinopterygii transcriptomes and 89% and 85% to Pfam familes and InterPro domains, respectively. Comparative genomics with 15 Actinopterygii species identified 59,448 gene families of which 12% were only in P. modestus. Conclusions We present the high quality of the first genome assembly and gene annotation of the shuttles hoppfish. It will provide a valuable resource for further studies on sea-land transition, bimodal respiration, nitrogen excretion, osmoregulation, thermoregulation, vision, and mechanoreception.


Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 52
Author(s):  
Ashley G. Yow ◽  
Hamed Bostan ◽  
Raúl Castanera ◽  
Valentino Ruggieri ◽  
Molla F. Mengist ◽  
...  

Pineapple (Ananas comosus (L.) Merr.) is the second most important tropical fruit crop globally, and ‘MD2’ is the most important cultivated variety. A high-quality genome is important for molecular-based breeding, but available pineapple genomes still have some quality limitations. Here, PacBio and Hi-C data were used to develop a new high-quality MD2 assembly and gene prediction. Compared to the previous MD2 assembly, major improvements included a 26.6-fold increase in contig N50 length, phased chromosomes, and >6000 new genes. The new MD2 assembly also included 161.6 Mb additional sequences and >3000 extra genes compared to the F153 genome. Over 48% of the predicted genes harbored potential deleterious mutations, indicating that the high level of heterozygosity in this species contributes to maintaining functional alleles. The genome was used to characterize the FAR1-RELATED SEQUENCE (FRS) genes that were expanded in pineapple and rice. Transposed and dispersed duplications contributed to expanding the numbers of these genes in the pineapple lineage. Several AcFRS genes were differentially expressed among tissue-types and stages of flower development, suggesting that their expansion contributed to evolving specialized functions in reproductive tissues. The new MD2 assembly will serve as a new reference for genetic and genomic studies in pineapple.


2021 ◽  
Vol 7 (12) ◽  
Author(s):  
Sebastian Cristian Treitli ◽  
Priscila Peña-Diaz ◽  
Paweł Hałakuc ◽  
Anna Karnkowska ◽  
Vladimír Hampl

Monocercomonoides exilis is considered the first known eukaryote to completely lack mitochondria. This conclusion is based primarily on a genomic and transcriptomic study which failed to identify any mitochondrial hallmark proteins. However, the available genome assembly has limited contiguity and around 1.5 % of the genome sequence is represented by unknown bases. To improve the contiguity, we re-sequenced the genome and transcriptome of M. exilis using Oxford Nanopore Technology (ONT). The resulting draft genome is assembled in 101 contigs with an N50 value of 1.38 Mbp, almost 20 times higher than the previously published assembly. Using a newly generated ONT transcriptome, we further improve the gene prediction and add high quality untranslated region (UTR) annotations, in which we identify two putative polyadenylation signals present in the 3′UTR regions and characterise the Kozak sequence in the 5′UTR regions. All these improvements are reflected by higher BUSCO genome completeness values. Regardless of an overall more complete genome assembly without missing bases and a better gene prediction, we still failed to identify any mitochondrial hallmark genes, thus further supporting the hypothesis on the absence of mitochondrion.


2021 ◽  
Author(s):  
Yu-fei Lin ◽  
Wei-An Liu ◽  
Yu-Ching Liu ◽  
Hsin-Han Lee ◽  
Yen-Ju Lin ◽  
...  

The ability to correlate the functional relationship between microbial communities and their environment is critical to understanding microbial ecology. There is emerging knowledge on island biogeography of microbes but how island characteristics influence functions of microbial community remain elusive. Here, we explored soil mycobiomes from nine islands adjacent to Taiwan using ITS2 amplicon sequencing. Geographical distances and island size were positively correlated to dissimilarity in mycobiomes, and we identified 56 zero-radius operational taxonomic units (zOTUs) that were ubiquitously present across all islands, and as few as five Mortierella zOTUs dominate more than half of mycobiomes. Correlation network analyses revealed that seven of the 45 hub species were part of the ubiquitous zOTUs belonging to Mortierella, Trichoderma, Aspergillus, Clonostachys and Staphylotrichum. We sequenced and annotated the genomes of seven Mortierella isolates, and comparative predictions of KEGG orthologues using PICRUSt2 database updated with new genomes increased sequence reads coverage by 62.9% at the genus level. In addition, genes associated with carbohydrate and lipid metabolisms were differentially abundant between islands which remained undetected in the original database. Predicted functional pathways were similar across islands despite their geographical separation, difference in differentially abundant genes and composition. Our approach demonstrated the incorporation of the key taxa genomic data can improve functional gene prediction results and can be readily applied to investigate other niches of interests.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lu Zhang ◽  
FengXin Chen ◽  
Zhan Zeng ◽  
Mengjiao Xu ◽  
Fangfang Sun ◽  
...  

Metagenomics is a new approach to study microorganisms obtained from a specific environment by functional gene screening or sequencing analysis. Metagenomics studies focus on microbial diversity, community constitute, genetic and evolutionary relationships, functional activities, and interactions and relationships with the environment. Sequencing technologies have evolved from shotgun sequencing to high-throughput, next-generation sequencing (NGS), and third-generation sequencing (TGS). NGS and TGS have shown the advantage of rapid detection of pathogenic microorganisms. With the help of new algorithms, we can better perform the taxonomic profiling and gene prediction of microbial species. Functional metagenomics is helpful to screen new bioactive substances and new functional genes from microorganisms and microbial metabolites. In this article, basic steps, classification, and applications of metagenomics are reviewed.


2021 ◽  
Author(s):  
Enrique González-Tortuero ◽  
Revathy Krishnamurthi ◽  
Heather E. Allison ◽  
Ian B. Goodhead ◽  
Chloe E. James

The number of newly available viral genomes and metagenomes has increased exponentially since the development of high throughput sequencing platforms and genome analysis tools. Bioinformatic annotation pipelines are largely based on open reading frame (ORF) calling software, which identifies genes independently of the sequence taxonomical background. Although ORF-calling programs provide a rapid genome annotation, they can misidentify ORFs and start codons; errors that might be perpetuated and propagated over time. This study evaluated the performance of multiple ORF-calling programs for viral genome annotation against the complete RefSeq viral database. Programs outputs varied when considering the viral nucleic acid type versus the viral host. According to the number of ORFs, Prodigal and Metaprodigal were the most accurate programs for DNA viruses, while FragGeneScan and Prodigal generated the most accurate outputs for RNA viruses. Similarly, Prodigal outperformed the benchmark for viruses infecting prokaryotes, and GLIMMER and GeneMarkS produced the most accurate annotations for viruses infecting eukaryotes. When the coordinates of the ORFs were considered, Prodigal scored high for all scenarios except for RNA viruses, where GeneMarkS generated the most reliable results. Overall, the quality of the coordinates predicted for RNA viruses was poorer than for DNA viruses, suggesting the need for improved ORF-calling programs to deal with RNA viruses. Moreover, none of the ORF-calling programs reached 90% accuracy for annotation of DNA viruses. Any automatic annotation can still be improved by manual curation, especially when the presence of ORFs is validated with wet-lab experiments. However, our evaluation of the current ORF-calling programs is expected to be useful for the improvement of viral genome annotation pipelines and highlights the need for more expression data to improve the rigor of reference genomes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hongqiang Zhang ◽  
Dingqian Liu ◽  
Shichao Zhu ◽  
Fanshun Wang ◽  
Xiaoning Sun ◽  
...  

Objectives: Patients with bicuspid aortic valve (BAV) are at increased risk for ascending aortic dilation (AAD). Our study was aimed at systemically analyzing the expression profile and mechanism of circulating plasma exosomal microRNAs (miRNAs) related to BAV and AAD.Methods: We isolated plasma exosomes from BAV patients (n=19), BAV patients with AAD (BAVAD, n=26), and healthy tricuspid aortic valve individuals with low cardiovascular risk (TAVnon, n=16). We applied a small RNA sequencing approach to identify the specific plasma exosomal miRNAs associated with BAV (n=8) and BAVAD (n=10) patients compared with healthy TAVnon (n=6) individuals. The candidate differentially expressed (DE) miRNAs were selected and validated by RT-qPCR in the remaining samples. GO and KEGG pathway enrichment analyses were performed to illustrate the functions of target genes. Western blot analysis and luciferase reporter assay were conducted in human aortic vascular smooth muscle cells (VSMCs) to verify the results of target gene prediction in vitro.Results: The expression levels of three up-regulated (miR-151a-3p, miR-423-5p, and miR-361-3p) and two down-regulated (miR-16-5p and miR-15a-5p) exosomal miRNAs were significantly altered in BAV disease. Additionally, miR-423-5p could be functionally involved in the occurrence and development of BAV and its complication BAVAD by regulating TGF-β signaling. miR-423-5p could target to SMAD2 and decreased the protein levels of SMAD2 and P-SMAD2.Conclusion: Plasma exosomal miR-423-5p regulated TGF-β signaling by targeting SMAD2, thus exerting functions in the occurrence and development of BAV disease and its complication bicuspid aortopathy.


Sign in / Sign up

Export Citation Format

Share Document