genomic data analysis
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 37)

H-INDEX

10
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Arif Ozgun Harmanci ◽  
Miran Kim ◽  
Su Wang ◽  
Wentao Li ◽  
Yongsoo Song ◽  
...  

As DNA sequencing data is available for personal use, genomic privacy is becoming a major challenge. Nevertheless, high-throughput genomic data analysis outsourcing is performed using pipelines that tend to overlook these challenges. Results: We present a client-server-based outsourcing framework for genotype imputation, an important step in genomic data analyses. Genotype data is encrypted by the client and encrypted data are used by the server that never observes the data in plain. Cloud-based framework can benefit from virtually unlimited computational resources while providing provable confidentiality. Availability: Server is publicly available at https://www.secureomics.org/OpenImpute. Users can anonymously test and use imputation server without registration.


Author(s):  
Bum-Sup Jang ◽  
Ji-Hyun Chang ◽  
Seung Hyuck Jeon ◽  
Myung Geun Song ◽  
Kyung-Hun Lee ◽  
...  

2021 ◽  
Vol 22 (16) ◽  
pp. 9101
Author(s):  
Godfred O. Sabbih ◽  
Michael K. Danquah

Neuroblastoma (NB) is a neuroectodermal embryonic cancer that originates from primordial neural crest cells, and amongst pediatric cancers with high mortality rates. NB is categorized into high-, intermediate-, and low-risk cases. A significant proportion of high-risk patients who achieve remission have a minimal residual disease (MRD) that causes relapse. Whilst there exists a myriad of advanced treatment options for NB, it is still characterized by a high relapse rate, resulting in a reduced chance of survival. Disialoganglioside (GD2) is a lipo-ganglioside containing a fatty acid derivative of sphingosine that is coupled to a monosaccharide and a sialic acid. Amongst pediatric solid tumors, NB tumor cells are known to express GD2; hence, it represents a unique antigen for subclinical NB MRD detection and analysis with implications in determining a response for treatment. This article discusses NB MRD expression and analytical assays for GD2 detection and quantification as well as computational approaches for GD2 characterization based on high-throughput image processing and genomic data analysis.


2021 ◽  
Author(s):  
Md. Shahadat Hossain ◽  
A. Q. M. Sala Uddin Pathan ◽  
Md. Nur Islam ◽  
Mahafujul Islam Quadery Tonmoy ◽  
Mahmudul Islam Rakib ◽  
...  

Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identify 3334545 mutations (14.01 mutations per isolate), suggesting a high mutation rate. Strains from India showed the highest no. of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. Besides the most prominently occurring mutations (D416G, F106F, P314L, and UTR:C241T), we identify L93L, A222V, A199A, V30L, and A220V mutations which are in the top 10 most frequent mutations. Multi-nucleotide mutations GGG>AAC, CC>TT, TG>CA, and AT>TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C>T, A>G, and A>T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T>C, G>A, and G>T mutations, respectively. T>G\A, C>G\A, and A>T\C are not anticipated in the future. Since SARS-CoV-2 is evolving continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide.


2021 ◽  
Vol 12 ◽  
Author(s):  
Juanying Xie ◽  
Mingzhao Wang ◽  
Shengquan Xu ◽  
Zhao Huang ◽  
Philip W. Grant

To tackle the challenges in genomic data analysis caused by their tens of thousands of dimensions while having a small number of examples and unbalanced examples between classes, the technique of unsupervised feature selection based on standard deviation and cosine similarity is proposed in this paper. We refer to this idea as SCFS (Standard deviation and Cosine similarity based Feature Selection). It defines the discernibility and independence of a feature to value its distinguishable capability between classes and its redundancy to other features, respectively. A 2-dimensional space is constructed using discernibility as x-axis and independence as y-axis to represent all features where the upper right corner features have both comparatively high discernibility and independence. The importance of a feature is defined as the product of its discernibility and its independence (i.e., the area of the rectangular enclosed by the feature’s coordinate lines and axes). The upper right corner features are by far the most important, comprising the optimal feature subset. Based on different definitions of independence using cosine similarity, there are three feature selection algorithms derived from SCFS. These are SCEFS (Standard deviation and Exponent Cosine similarity based Feature Selection), SCRFS (Standard deviation and Reciprocal Cosine similarity based Feature Selection) and SCAFS (Standard deviation and Anti-Cosine similarity based Feature Selection), respectively. The KNN and SVM classifiers are built based on the optimal feature subsets detected by these feature selection algorithms, respectively. The experimental results on 18 genomic datasets of cancers demonstrate that the proposed unsupervised feature selection algorithms SCEFS, SCRFS and SCAFS can detect the stable biomarkers with strong classification capability. This shows that the idea proposed in this paper is powerful. The functional analysis of these biomarkers show that the occurrence of the cancer is closely related to the biomarker gene regulation level. This fact will benefit cancer pathology research, drug development, early diagnosis, treatment and prevention.


2021 ◽  
Vol 17 (5) ◽  
pp. e1008977
Author(s):  
Amir Bahmani ◽  
Kyle Ferriter ◽  
Vandhana Krishnan ◽  
Arash Alavi ◽  
Amir Alavi ◽  
...  

Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.


2021 ◽  
Author(s):  
Michael C. Schatz ◽  
Anthony A. Philippakis ◽  
Enis Afgan ◽  
Eric Banks ◽  
Vincent J. Carey ◽  
...  

AbstractThe traditional model of genomic data analysis - downloading data from centralized warehouses for analysis with local computing resources - is increasingly unsustainable. Not only are transfers slow and cost prohibitive, but this approach also leads to redundant and siloed compute infrastructure that makes it difficult to ensure security and compliance of protected data. The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) inverts this model, providing a unified cloud computing environment for data storage, management, and analysis. AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides scalable, shared computing resources that can be acquired by researchers as needed. This presents many new opportunities for collaboration and data sharing that will ultimately lead to scientific discoveries at scales not previously possible.


Nucleus ◽  
2021 ◽  
Author(s):  
Stephen Lindsly ◽  
Can Chen ◽  
Sijia Liu ◽  
Scott Ronquist ◽  
Samuel Dilworth ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document