scholarly journals Design and development of learning model for compression and processing of deoxyribonucleic acid genome sequence

Author(s):  
Raveendra Gudodagi ◽  
Rayapur Venkata Siva Reddy ◽  
Mohammed Riyaz Ahmed

Owing to the substantial volume of human genome sequence data files (from 30-200 GB exposed) Genomic data compression has received considerable traction and storage costs are one of the major problems faced by genomics laboratories. This involves a modern technology of data compression that reduces not only the storage but also the reliability of the operation. There were few attempts to solve this problem independently of both hardware and software. A systematic analysis of associations between genes provides techniques for the recognition of operative connections among genes and their respective yields, as well as understandings into essential biological events that are most important for knowing health and disease phenotypes. This research proposes a reliable and efficient deep learning system for learning embedded projections to combine gene interactions and gene expression in prediction comparison of deep embeddings to strong baselines. In this paper we preform data processing operations and predict gene function, along with gene ontology reconstruction and predict the gene interaction. The three major steps of genomic data compression are extraction of data, storage of data, and retrieval of the data. Hence, we propose a deep learning based on computational optimization techniques which will be efficient in all the three stages of data compression.

2018 ◽  
Vol 49 (1) ◽  
pp. 433-456 ◽  
Author(s):  
Annabel C. Beichman ◽  
Emilia Huerta-Sanchez ◽  
Kirk E. Lohmueller

Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.


2019 ◽  
Vol 2 (3) ◽  
pp. 39
Author(s):  
Simon Meyer Lauritsen ◽  
Mads Ellersgaard Kalør ◽  
Emil Lund Kongsgaard ◽  
Bo Thiesson

Background: Sepsis is a clinical condition involving an extreme inflammatory response to an infection, and is associated with high morbidity and mortality. Without intervention, this response can progress to septic shock, organ failure and death. Every hour that treatment is delayed mortality increases. Early identification of sepsis is therefore important for a positive outcome. Methods: We constructed predictive models for sepsis detection and performed a register-based cohort study on patients from four Danish municipalities. We used event-sequences of raw electronic health record (EHR) data from 2013 to 2017, where each event consists of three elements: a timestamp, an event category (e.g. medication code), and a value. In total, we consider 25.622 positive (SIRS criteria) sequences and 25.622 negative sequences with a total of 112 million events distributed across 64 different hospital units. The number of potential predictor variables in raw EHR data easily exceeds 10.000 and can be challenging for predictive modeling due to this large volume of sparse, heterogeneous events. Traditional approaches have dealt with this complexity by curating a limited number of variables of importance; a labor-intensive process that may discard a vast majority of information. In contrast, we consider a deep learning system constructed as a combination of a convolutional neural network (CNN) and long short-term memory (LSTM) network. Importantly, our system learns representations of the key factors and interactions from the raw event sequence data itself. Results: Our model predicts sepsis with an AUROC score of 0.8678, at 11 hours before actual treatment was started, outperforming all currently deployed approaches. At other prediction times, the model yields following AUROC scores. 15 min: 0.9058, 3 hours: 0.8803, 24 hours: 0.8073. Conclusion: We have presented a novel approach for early detection of sepsis that has more true positives and fewer false negatives than existing alarm systems without introducing domain knowledge into the model. Importantly, the model does not require changes in the daily workflow of healthcare professionals at hospitals, as the model is based on data that is routinely captured in the EHR. This also enables real-time prediction, as healthcare professionals enters the raw events in the EHR.


2019 ◽  
Vol 14 (2) ◽  
pp. 123-129
Author(s):  
Muhammad Sardaraz ◽  
Muhammad Tahir

Background: Biological sequence data have increased at a rapid rate due to the advancements in sequencing technologies and reduction in the cost of sequencing data. The huge increase in these data presents significant research challenges to researchers. In addition to meaningful analysis, data storage is also a challenge, an increase in data production is outpacing the storage capacity. Data compression is used to reduce the size of data and thus reduces storage requirements as well as transmission cost over the internet. Objective: This article presents a novel compression algorithm (FCompress) for Next Generation Sequencing (NGS) data in FASTQ format. Method: The proposed algorithm uses bits manipulation and dictionary-based compression for bases compression. Headers are compressed with reference-based compression, whereas quality scores are compressed with Huffman coding. Results: The proposed algorithm is validated with experimental results on real datasets. The results are compared with both general purpose and specialized compression programs. Conclusion: The proposed algorithm produces better compression ratio in a comparable time to other algorithms.


2021 ◽  
Author(s):  
Dmitrii Kriukov ◽  
Nikita Koritskiy ◽  
Igor Kozlovskii ◽  
Mark Zaretckii ◽  
Mariia Bazarevich ◽  
...  

The increasing interest in chromatin conformation inside the nucleus and the availability of genome-wide experimental data make it possible to develop computational methods that can increase the quality of the data and thus overcome the limitations of high experimental costs. Here we develop a deep-learning approach for increasing Hi-C data resolution by appending additional information about genome sequence. In this approach, we utilize two different deep-learning algorithms: the image-to-image model, which enhances Hi-C resolution by itself, and the sequence-to-image model, which uses additional information about the underlying genome sequence for further resolution improvement. Both models are combined with the simple head model that provides a more accurate enhancement of initial low-resolution Hi-C data. The code is freely available in a GitHub repository: https://github.com/koritsky/DL2021 HI-C


2020 ◽  
pp. bjophthalmol-2020-317825
Author(s):  
Yonghao Li ◽  
Weibo Feng ◽  
Xiujuan Zhao ◽  
Bingqian Liu ◽  
Yan Zhang ◽  
...  

Background/aimsTo apply deep learning technology to develop an artificial intelligence (AI) system that can identify vision-threatening conditions in high myopia patients based on optical coherence tomography (OCT) macular images.MethodsIn this cross-sectional, prospective study, a total of 5505 qualified OCT macular images obtained from 1048 high myopia patients admitted to Zhongshan Ophthalmic Centre (ZOC) from 2012 to 2017 were selected for the development of the AI system. The independent test dataset included 412 images obtained from 91 high myopia patients recruited at ZOC from January 2019 to May 2019. We adopted the InceptionResnetV2 architecture to train four independent convolutional neural network (CNN) models to identify the following four vision-threatening conditions in high myopia: retinoschisis, macular hole, retinal detachment and pathological myopic choroidal neovascularisation. Focal Loss was used to address class imbalance, and optimal operating thresholds were determined according to the Youden Index.ResultsIn the independent test dataset, the areas under the receiver operating characteristic curves were high for all conditions (0.961 to 0.999). Our AI system achieved sensitivities equal to or even better than those of retina specialists as well as high specificities (greater than 90%). Moreover, our AI system provided a transparent and interpretable diagnosis with heatmaps.ConclusionsWe used OCT macular images for the development of CNN models to identify vision-threatening conditions in high myopia patients. Our models achieved reliable sensitivities and high specificities, comparable to those of retina specialists and may be applied for large-scale high myopia screening and patient follow-up.


Author(s):  
Amnon Koren ◽  
Dashiell J Massey ◽  
Alexa N Bracci

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online


2020 ◽  
Vol 101 ◽  
pp. 209
Author(s):  
R. Baskaran ◽  
B. Ajay Rajasekaran ◽  
V. Rajinikanth
Keyword(s):  

Endoscopy ◽  
2020 ◽  
Author(s):  
Alanna Ebigbo ◽  
Robert Mendel ◽  
Tobias Rückert ◽  
Laurin Schuster ◽  
Andreas Probst ◽  
...  

Background and aims: The accurate differentiation between T1a and T1b Barrett’s cancer has both therapeutic and prognostic implications but is challenging even for experienced physicians. We trained an Artificial Intelligence (AI) system on the basis of deep artificial neural networks (deep learning) to differentiate between T1a and T1b Barrett’s cancer white-light images. Methods: Endoscopic images from three tertiary care centres in Germany were collected retrospectively. A deep learning system was trained and tested using the principles of cross-validation. A total of 230 white-light endoscopic images (108 T1a and 122 T1b) was evaluated with the AI-system. For comparison, the images were also classified by experts specialized in endoscopic diagnosis and treatment of Barrett’s cancer. Results: The sensitivity, specificity, F1 and accuracy of the AI-system in the differentiation between T1a and T1b cancer lesions was 0.77, 0.64, 0.73 and 0.71, respectively. There was no statistically significant difference between the performance of the AI-system and that of human experts with sensitivity, specificity, F1 and accuracy of 0.63, 0.78, 0.67 and 0.70 respectively. Conclusion: This pilot study demonstrates the first multicenter application of an AI-based system in the prediction of submucosal invasion in endoscopic images of Barrett’s cancer. AI scored equal to international experts in the field, but more work is necessary to improve the system and apply it to video sequences and in a real-life setting. Nevertheless, the correct prediction of submucosal invasion in Barret´s cancer remains challenging for both experts and AI.


Sign in / Sign up

Export Citation Format

Share Document