sequence data
Recently Published Documents





2024 ◽  
Vol 84 ◽  
M. F. R. Dias ◽  
F. L. L. Oliveira ◽  
V. S. Pontes ◽  
M. L. Silva

Abstract In recent years, the development of high-throughput technologies for obtaining sequence data leveraged the possibility of analysis of protein data in silico. However, when it comes to viral polyprotein interaction studies, there is a gap in the representation of those proteins, given their size and length. The prepare for studies using state-of-the-art techniques such as Machine Learning, a good representation of such proteins is a must. We present an alternative to this problem, implementing a fragmentation and modeling protocol to prepare those polyproteins in the form of peptide fragments. Such procedure is made by several scripts, implemented together on the workflow we call PolyPRep, a tool written in Python script and available in GitHub. This software is freely available only for noncommercial users.

Raveendra Gudodagi ◽  
Rayapur Venkata Siva Reddy ◽  
Mohammed Riyaz Ahmed

Owing to the substantial volume of human genome sequence data files (from 30-200 GB exposed) Genomic data compression has received considerable traction and storage costs are one of the major problems faced by genomics laboratories. This involves a modern technology of data compression that reduces not only the storage but also the reliability of the operation. There were few attempts to solve this problem independently of both hardware and software. A systematic analysis of associations between genes provides techniques for the recognition of operative connections among genes and their respective yields, as well as understandings into essential biological events that are most important for knowing health and disease phenotypes. This research proposes a reliable and efficient deep learning system for learning embedded projections to combine gene interactions and gene expression in prediction comparison of deep embeddings to strong baselines. In this paper we preform data processing operations and predict gene function, along with gene ontology reconstruction and predict the gene interaction. The three major steps of genomic data compression are extraction of data, storage of data, and retrieval of the data. Hence, we propose a deep learning based on computational optimization techniques which will be efficient in all the three stages of data compression.

2022 ◽  
Vol 12 ◽  
Lidia De los Ríos-Pérez ◽  
Tom Druet ◽  
Tom Goldammer ◽  
Dörte Wittenburg

Pikeperch (Sander lucioperca) has emerged as a high value species to the aquaculture industry. However, its farming techniques are at an early stage and its production is often performed without a selective breeding program, potentially leading to high levels of inbreeding. In this study, we identified and characterized autozygosity based on genome-wide runs of homozygosity (ROH) on a sample of parental and offspring individuals, determined effective population size (Ne), and assessed relatedness among parental individuals. A mean of 2,235 ± 526 and 1,841 ± 363 ROH segments per individual, resulting in a mean inbreeding coefficient of 0.33 ± 0.06 and 0.25 ± 0.06 were estimated for the progeny and parents, respectively. Ne was about 12 until four generations ago and at most 106 for 63 generations in the past, with varying genetic relatedness amongst the parents. This study shows the importance of genomic information when family relationships are unknown and the need of selective breeding programs for reproductive management decisions in the aquaculture industry.

2022 ◽  
Michael Sennett ◽  
Douglas Theobald

Ancestral sequence reconstruction (ASR) has become widely used to analyze the properties of ancient biomolecules and to elucidate the mechanisms of molecular evolution. By recapitulating the structural, mechanistic, and functional changes of proteins during their evolution, ASR has been able to address many fundamental and challenging evolutionary questions where more traditional methods have failed. Despite the tangible successes of ASR, the accuracy of its reconstructions is currently unknown, because it is generally impossible to compare resurrected proteins to the true ancient ancestors that are now extinct. Which evolutionary models are the best for ASR? How accurate are the resulting inferences? Here we answer these questions by applying cross-validation (CV) to sets of aligned extant sequences. To assess the adequacy of a chosen evolutionary model for predicting extant sequence data, our column-wise CV method iteratively cross-validates each column in an alignment. Unlike other phylogenetic model selection criteria, this method does not require bias correction and does not make restrictive assumptions commonly violated by phylogenetic data. We find that column-wise CV generally provides a more conservative criterion than the AIC by preferring less complex models. To validate ASR methods, we also apply cross-validation to each sequence in an alignment by reconstructing the extant sequences using ASR methodology, a method we term extant sequence reconstruction (ESR). We can thus quantify the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability of the sequence, is indeed a good estimate of the fraction of the sequence that is correct when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions, because more accurate phylogenetic models typically result in reconstructions with lower average probabilities. In contrast, the entropy of the reconstructed distribution is a reliable indicator of the quality of a reconstruction, as the entropy provides an accurate estimate of the log-probability of the true sequence. Both column-wise CV and ESR are useful methods to validate evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences.

2022 ◽  
Shruthy Priya Prakash ◽  
Vaidheki Chandrasekar ◽  
Selvi Subramanian ◽  
Rahamatthunnisha Ummar

Banana being a major food crop all around the world, attracts various research interests in crop improvement. In banana, complete genome sequences of Musa accuminata and Musa balbisiana are available. However, the mitochondrial genome is not sequenced or assembled. Mitochondrial (mt) genes play an important role in flower and seed development and in Cytoplasmic Male Sterility. Unraveling banana mt genome architecture will be a foundation for understanding inheritance of traits and their evolution. In this study, the complete banana mt genome is assembled from the whole genome sequence data of the Musa acuminata subsp. malaccensis DH-Pahang. The mt genome sequence acquired by this approach was 409574 bp and it contains, 54 genes coding for 25 respiratory complex proteins 15 ribosomal proteins, 12 tRNA genes and two ribosomal RNA gene. Except atpB, rps11 and rps19 other genes are in multiple copies. The copy number is 12 in tRNA genes. In addition, nearly 25% tandem repeats are also present in it. These mt proteins are identical to the mt proteins present in the other members of AA genome and share 98% sequence similarity with M. balbisiana. The C to U RNA editing is profoundly higher (87 vs 13%) in transcripts of M. balbisiana (BB) compared to M. accuminata (AA). The banana AA mitochondrial genome is tightly packed with 233 genes, with less rearrangements and just 5.3% chloroplast DNA in it. The maintenance of high copy number of functional mt genes suggest that they have a crucial role in the evolution of banana.

2022 ◽  
Vol 15 (1) ◽  
Artur Trzebny ◽  
Justyna Liberska ◽  
Anna Slodkowicz-Kowalska ◽  
Miroslawa Dabert

Abstract Background Microsporidia is a large group of eukaryotic obligate intracellular spore-forming parasites, of which 17 species can cause microsporidiosis in humans. Most human-infecting microsporidians belong to the genera Enterocytozoon and Encephalitozoon. To date, only five microsporidian species, including Encephalitozoon-like, have been found in hard ticks (Ixodidae) using microscopic methods, but no sequence data are available for them. Furthermore, no widespread screening for microsporidian-infected ticks based on DNA analysis has been carried out to date. Thus, in this study, we applied a recently developed DNA metabarcoding method for efficient microsporidian DNA identification to assess the role of ticks as potential vectors of microsporidian species causing diseases in humans. Methods In total, 1070 (493 juvenile and 577 adult) unfed host-seeking Ixodes ricinus ticks collected at urban parks in the city of Poznan, Poland, and 94 engorged tick females fed on dogs and cats were screened for microsporidian DNA. Microsporidians were detected by PCR amplification and sequencing of the hypervariable V5 region of 18S rRNA gene (18S profiling) using the microsporidian-specific primer set. Tick species were identified morphologically and confirmed by amplification and sequencing of the shortened fragment of cytochrome c oxidase subunit I gene (mini-COI). Results All collected ticks were unambiguously assigned to I. ricinus. Potentially zoonotic Encephalitozoon intestinalis was identified in three fed ticks (3.2%) collected from three different dogs. In eight unfed host-seeking ticks (0.8%), including three males (1.1%), two females (0.7%) and three nymphs (0.7%), the new microsporidian sequence representing a species belonging to the genus Endoreticulatus was identified. Conclusions The lack of zoonotic microsporidians in host-seeking ticks suggests that I. ricinus is not involved in transmission of human-infecting microsporidians. Moreover, a very low occurrence of the other microsporidian species in both fed and host-seeking ticks implies that mechanisms exist to defend ticks against infection with these parasites. Graphical abstract

2022 ◽  
Vol 101 (1) ◽  
Ambikabai Raghavanpillai Sivu ◽  
Nediyaparambu Sukumaran Pradeep ◽  
Alagramam Govindasamy Pandurangan ◽  
Mayank D. Dwivedi ◽  
Arun K. Pandey

2022 ◽  
Vol 14 (2) ◽  
pp. 387
Yeonjin Lee ◽  
Myoung-Hwan Ahn ◽  
Su Jeong Lee

Early warning of severe weather caused by intense convective weather systems is challenging. To help such activities, meteorological satellites with high temporal and spatial resolution have been utilized for the monitoring of instability trends along with water vapor variation. The current study proposes a retrieval algorithm based on an artificial neural network (ANN) model to quickly and efficiently derive total precipitable water (TPW) and convective available potential energy (CAPE) from Korea’s second geostationary satellite imagery measurements (GEO-KOMPSAT-2A/Advanced Meteorological Imager (AMI)). To overcome the limitations of the traditional static (ST) learning method such as exhaustive learning, impractical, and not matching in a sequence data, we applied an ANN model with incremental (INC) learning. The INC ANN uses a dynamic dataset that begins with the existing weight information transferred from a previously learned model when new samples emerge. To prevent sudden changes in the distribution of learning data, this method uses a sliding window that moves along the data with a window of a fixed size. Through an empirical test, the update cycle and the window size of the model are set to be one day and ten days, respectively. For the preparation of learning datasets, nine infrared brightness temperatures of AMI, six dual channel differences, temporal and geographic information, and a satellite zenith angle are used as input variables, and the TPW and CAPE from ECMWF model reanalysis (ERA5) data are used as the corresponding target values over the clear-sky conditions in the Northeast Asia region for about one year. Through the accuracy tests with radiosonde observation for one year, the INC NN results demonstrate improved performance (the accuracy of TPW and CAPE decreased by approximately 26% and 26% for bias and about 13% and 12% for RMSE, respectively) when compared to the ST learning. Evaluation results using ERA5 data also reveal more stable error statistics over time and overall reduced error distribution compared with ST ANN.

Carmelo Andujar ◽  
Paula Arribas ◽  
Heriberto López ◽  
Yurena Arjona ◽  
Antonio Pérez-Delgado ◽  

Most of our understanding of island diversity comes from the study of aboveground systems, while the patterns and processes of diversification and community assembly for belowground biotas remain poorly understood. Here we take advantage of a relatively young and dynamic oceanic island to advance our understanding of eco-evolutionary processes driving community assembly within soil mesofauna. Using whole organism community DNA (wocDNA) metabarcoding and the recently developed metaMATE pipeline, we have generated spatially explicit and reliable haplotype-level DNA sequence data for soil mesofaunal assemblages sampled across the four main habitats within the island of Tenerife. Community ecological and metaphylogeographic analyses have been performed at multiple levels of genetic similarity, from haplotypes to species and supraspecific groupings. Broadly consistent patterns of local-scale species richness across different insular habitats have been found, whereas local insular richness is lower than in continental settings. Our results reveal an important role for niche conservatism as a driver of insular community assembly of soil mesofauna, with only limited evidence for habitat shifts promoting diversification. Furthermore, support is found for a fundamental role of habitat in the assembly of soil mesofauna, where habitat specialism is mainly due to colonisation and the establishment of preadapted species. Hierarchical patterns of distance decay at the community level and metaphylogeographical analyses support a pattern of geographic structuring over limited spatial scales, from the level of haplotypes through to species and lineages, as expected for taxa with strong dispersal limitations. Our results demonstrate the potential for wocDNA metabarcoding to advance our understanding of biodiversity.

Christina L. Elling ◽  
Melissa A. Scholes ◽  
Sven-Olrik Streubel ◽  
Eric D. Larson ◽  
Todd M. Wine ◽  

Otitis media (OM) is a leading cause of childhood hearing loss. Variants in FUT2, which encodes alpha-(1,2)-fucosyltransferase, were identified to increase susceptibility to OM, potentially through shifts in the middle ear (ME) or nasopharyngeal (NP) microbiotas as mediated by transcriptional changes. Greater knowledge of differences in relative abundance of otopathogens in carriers of pathogenic variants can help determine risk for OM in patients. In order to determine the downstream effects of FUT2 variation, we examined gene expression in relation to carriage of a common pathogenic FUT2 c.461G>A (p.Trp154*) variant using RNA-sequence data from saliva samples from 28 patients with OM. Differential gene expression was also examined in bulk mRNA and single-cell RNA-sequence data from wildtype mouse ME mucosa after inoculation with non-typeable Haemophilus influenzae (NTHi). In addition, microbiotas were profiled from ME and NP samples of 65 OM patients using 16S rRNA gene sequencing. In human carriers of the FUT2 variant, FN1, KMT2D, MUC16 and NBPF20 were downregulated while MTAP was upregulated. Post-infectious expression in the mouse ME recapitulated these transcriptional differences, with the exception of Fn1 upregulation after NTHi-inoculation. In the NP, Candidate Division TM7 was associated with wildtype genotype (FDR-adj-p=0.009). Overall, the FUT2 c.461G>A variant was associated with transcriptional changes in processes related to response to infection and with increased load of potential otopathogens in the ME and decreased commensals in the NP. These findings provide increased understanding of how FUT2 variants influence gene transcription and the mucosal microbiota, and thus contribute to the pathology of OM.

Sign in / Sign up

Export Citation Format

Share Document