scholarly journals A Survey of Autoencoder Algorithms to Pave the Diagnosis of Rare Diseases

2021 ◽  
Vol 22 (19) ◽  
pp. 10891
Author(s):  
David Pratella ◽  
Samira Ait-El-Mkadem Saadi ◽  
Sylvie Bannwarth ◽  
Véronique Paquis-Fluckinger ◽  
Silvia Bottini

Rare diseases (RDs) concern a broad range of disorders and can result from various origins. For a long time, the scientific community was unaware of RDs. Impressive progress has already been made for certain RDs; however, due to the lack of sufficient knowledge, many patients are not diagnosed. Nowadays, the advances in high-throughput sequencing technologies such as whole genome sequencing, single-cell and others, have boosted the understanding of RDs. To extract biological meaning using the data generated by these methods, different analysis techniques have been proposed, including machine learning algorithms. These methods have recently proven to be valuable in the medical field. Among such approaches, unsupervised learning methods via neural networks including autoencoders (AEs) or variational autoencoders (VAEs) have shown promising performances with applications on various type of data and in different contexts, from cancer to healthy patient tissues. In this review, we discuss how AEs and VAEs have been used in biomedical settings. Specifically, we discuss their current applications and the improvements achieved in diagnostic and survival of patients. We focus on the applications in the field of RDs, and we discuss how the employment of AEs and VAEs would enhance RD understanding and diagnosis.

2021 ◽  
Author(s):  
Yuan Peng ◽  
Azadeh Nassirian ◽  
Najia Ahmadi ◽  
Martin Sedlmayr ◽  
Franziska Bathelt

High throughput sequencing technologies have facilitated an outburst in biological knowledge over the past decades and thus enables improvements in personalized medicine. In order to support (international) medical research with the combination of genomic and clinical patient data, a standardization and harmonization of these data sources is highly desirable. To support this increasing importance of genomic data, we have created semantic mapping from raw genomic data to both FHIR (Fast Healthcare Interoperability Resources) and OMOP (Observational Medical Outcomes Partnership) CDM (Common Data Model) and analyzed the data coverage of both models. For this, we calculated the mapping score for different data categories and the relative data coverage in both FHIR and OMOP CDM. Our results show, that the patients genomic data can be mapped to OMOP CDM directly from VCF (Variant Call Format) file with a coverage of slightly over 50%. However, using FHIR as intermediate representation does not lead to further information loss as the already stored data in FHIR can be further transformed into OMOP CDM format with almost 100% success. Our findings are in favor of extending OMOP CDM with patient genomic data using ETL to enable the researchers to apply different analysis methods including machine learning algorithms on genomic data.


2020 ◽  
Vol 39 (4) ◽  
pp. 5905-5914
Author(s):  
Chen Gong

Most of the research on stressors is in the medical field, and there are few analysis of athletes’ stressors, so it can not provide reference for the analysis of athletes’ stressors. Based on this, this study combines machine learning algorithms to analyze the pressure source of athletes’ stadium. In terms of data collection, it is mainly obtained through questionnaire survey and interview form, and it is used as experimental data after passing the test. In order to improve the performance of the algorithm, this paper combines the known K-Means algorithm with the layering algorithm to form a new improved layered K-Means algorithm. At the same time, this paper analyzes the performance of the improved hierarchical K-Means algorithm through experimental comparison and compares the clustering results. In addition, the analysis system corresponding to the algorithm is constructed based on the actual situation, the algorithm is applied to practice, and the user preference model is constructed. Finally, this article helps athletes find stressors and find ways to reduce stressors through personalized recommendations. The research shows that the algorithm of this study is reliable and has certain practical effects and can provide theoretical reference for subsequent related research.


Author(s):  
Pramila Arulanthu ◽  
Eswaran Perumal

: The medical data has an enormous quantity of information. This data set requires effective classification for accurate prediction. Predicting medical issues is an extremely difficult task in which Chronic Kidney Disease (CKD) is one of the major unpredictable diseases in medical field. Perhaps certain medical experts do not have identical awareness and skill to solve the issues of their patients. Most of the medical experts may have underprivileged results on disease diagnosis of their patients. Sometimes patients may lose their life in nature. As per the Global Burden of Disease (GBD-2015) study, death by CKD was ranked 17th place and GBD-2010 report 27th among the causes of death globally. Death by CKD is constituted 2·9% of all death between the year 2010 and 2013 among people from 15 to 69 age. As per World Health Organization (WHO-2005) report, 58 million people expired by CKD. Hence, this article presents the state of art review on Chronic Kidney Disease (CKD) classification and prediction. Normally, advanced data mining techniques, fuzzy and machine learning algorithms are used to classify medical data and disease diagnosis. This study reviews and summarizes many classification techniques and disease diagnosis methods presented earlier. The main intention of this review is to point out and address some of the issues and complications of the existing methods. It is also attempts to discuss the limitations and accuracy level of the existing CKD classification and disease diagnosis methods.


2020 ◽  
Vol 110 (1) ◽  
pp. 106-120 ◽  
Author(s):  
Avijit Roy ◽  
Andrew L. Stone ◽  
Gabriel Otero-Colina ◽  
Gang Wei ◽  
Ronald H. Brlansky ◽  
...  

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.


Viruses ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1424
Author(s):  
Lia W. Liefting ◽  
David W. Waite ◽  
Jeremy R. Thompson

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.


Author(s):  
Stella C. Yuan ◽  
Eric Malekos ◽  
Melissa T. R. Hawkins

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.


2021 ◽  
Vol 41 (1) ◽  
Author(s):  
Mineto Ota ◽  
Keishi Fujio

AbstractRecent innovation in high-throughput sequencing technologies has drastically empowered the scientific research. Consequently, now, it is possible to capture comprehensive profiles of samples at multiple levels including genome, epigenome, and transcriptome at a time. Applying these kinds of rich information to clinical settings is of great social significance. For some traits such as cardiovascular diseases, attempts to apply omics datasets in clinical practice for the prediction of the disease risk have already shown promising results, although still under way for immune-mediated diseases. Multiple studies have tried to predict treatment response in immune-mediated diseases using genomic, transcriptomic, or clinical information, showing various possible indicators. For better prediction of treatment response or disease outcome in immune-mediated diseases, combining multi-layer information together may increase the power. In addition, in order to efficiently pick up meaningful information from the massive data, high-quality annotation of genomic functions is also crucial. In this review, we discuss the achievement so far and the future direction of multi-omics approach to immune-mediated diseases.


Author(s):  
Hannah Bolinger ◽  
David Tran ◽  
Kenneth Harary ◽  
George C. Paoli ◽  
Giselle Guron ◽  
...  

Traditional microbiological testing methods are slow, and many molecular-based techniques rely on culture-based enrichment to overcome low limits of detection. Recent advancements in sequencing technologies may make it possible to utilize machine learning (ML) to identify patterns in microbiome data to potentially predict the presence or absence of pathogens. In this study, 299 poultry rinsate samples from various points in the processing chain were analyzed to determine if microbiota could inform about a sample’s risk for containing Salmonella . Samples were culture confirmed as Salmonella -positive or -negative following modified USDA MLG protocols. The culture confirmation result was used as a reference to compare with 16S sequencing data. Pre-chill samples tested positive (71/82) at a higher frequency than post-chill samples (30/217) and contained greater microbial diversity. Due to their larger sample size, post-chill samples were analyzed more deeply. Analysis of variance (ANOVA) identified a significant effect of chilling on the number of genera (p<0.001), but analysis of similarities (ANOSIM) failed to provide evidence for microbial dissimilarity between pre- and post-chill samples (p=0.001, R=0.443). Various ML models were trained using post-chill samples to predict if a sample contained Salmonella based on the samples’ microbiota pre-enrichment. The optimal model was a Random Forest-based model with a performance as follows: accuracy (88%), sensitivity (85%), specificity (90%). While the algorithms described in this paper are prototypes, these risk-based algorithms demonstrate the potential and need for further studies to provide insight alongside diagnostic tests. Combining risk-based information with diagnostic tools can help poultry processors make informed decisions to help identify and prevent the spread of Salmonella . These data add to the growing body of literature exploring novel ways to utilize microbiome data for predictive food safety.


Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 429 ◽  
Author(s):  
Daniela Barros-Silva ◽  
C. Marques ◽  
Rui Henrique ◽  
Carmen Jerónimo

DNA methylation is an epigenetic modification that plays a pivotal role in regulating gene expression and, consequently, influences a wide variety of biological processes and diseases. The advances in next-generation sequencing technologies allow for genome-wide profiling of methyl marks both at a single-nucleotide and at a single-cell resolution. These profiling approaches vary in many aspects, such as DNA input, resolution, coverage, and bioinformatics analysis. Thus, the selection of the most feasible method according with the project’s purpose requires in-depth knowledge of those techniques. Currently, high-throughput sequencing techniques are intensively used in epigenomics profiling, which ultimately aims to find novel biomarkers for detection, diagnosis prognosis, and prediction of response to therapy, as well as to discover new targets for personalized treatments. Here, we present, in brief, a portrayal of next-generation sequencing methodologies’ evolution for profiling DNA methylation, highlighting its potential for translational medicine and presenting significant findings in several diseases.


2019 ◽  
Author(s):  
Reneth Millas ◽  
Mary Espina ◽  
CM Sabbir Ahmed ◽  
Angelina Bernardini ◽  
Ekundayo Adeleke ◽  
...  

ABSTRACTOne of the most important tools in genetic improvement is mutagenesis, which is a useful tool to induce genetic and phenotypic variation for trait improvement and discovery of novel genes. JTN-5203 (MG V) mutant population was generated using an induced ethyl methane sulfonate (EMS) mutagenesis and was used for detection of induced mutations in FAD2-1A and FAD2-1B genes using reverse genetics approach. Optimum concentration of EMS was used to treat 15,000 bulk JTN-5203 seeds producing 1,820 M2 population. DNA was extracted, normalized, and pooled from these individuals. Specific primers were designed from FAD2-1A and FAD2-1B genes that are involved in the fatty acid biosynthesis pathway for further analysis using next-generation sequencing. High throughput mutation discovery through TILLING-by-Sequencing approach was used to detect novel allelic variations in this population. Several mutations and allelic variations with high impacts were detected for FAD2-1A and FAD2-1B. This includes GC to AT transition mutations in FAD2-1A (20%) and FAD2-1B (69%). Mutation density for this population is estimated to be about 1/136kb. Through mutagenesis and high-throughput sequencing technologies, novel alleles underlying the mutations observed in mutants with reduced polyunsaturated fatty acids will be identified, and these mutants can be further used in breeding soybean lines with improved fatty acid profile, thereby developing heart-healthy-soybeans.


Sign in / Sign up

Export Citation Format

Share Document