scholarly journals Interactive Analytics for Very Large Scale Genomic Data

2015 ◽  
Author(s):  
Cuiping Pan ◽  
Nicole Deflaux ◽  
Gregory McInnes ◽  
Michael Snyder ◽  
Jonathan Bingham ◽  
...  

Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Here we present interactive analytics using public cloud infrastructure and distributed computing database Dremel and developed according to the standards of Global Alliance for Genomics and Health, to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate that such computing paradigms can provide orders of magnitude faster turnaround for common analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds.


Author(s):  
Maria Rodriguez ◽  
Rajkumar Buyya

Containers are widely used by organizations to deploy diverse workloads such as web services, big data, and IoT applications. Container orchestration platforms are designed to manage the deployment of containerized applications in large-scale clusters. The majority of these platforms optimize the scheduling of containers on a fixed-sized cluster and are not enabled to autoscale the size of the cluster nor to consider features specific to public cloud environments. This chapter presents a resource management approach with three objectives: 1) optimize the initial placement of containers by efficiently scheduling them on existing resources, 2) autoscale the number of resources at runtime based on the cluster's workload, and 3) consolidate applications into fewer VMs at runtime. The framework was implemented as a Kubernetes plugin and its efficiency was evaluated on an Australian cloud infrastructure. The experiments demonstrate that a reduction of 58% in cost can be achieved by dynamically managing the cluster size and placement of applications.



Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 2020
Author(s):  
Lingtong Liu ◽  
Yulong Shen ◽  
Shuiguang Zeng ◽  
Zhiwei Zhang

Network measurements are the foundation for network applications. The metrics generated by those measurements help applications improve their performance of the monitored network and harden their security. As severe network attacks using leaked information from a public cloud exist, it raises privacy and security concerns if directly deployed in network measurement services in a third-party public cloud infrastructure. Recent studies, most notably OblivSketch, demonstrated the feasibility of alleviating those concerns by using trusted hardware and Oblivious RAM (ORAM). As their performance is not good enough, and there are certain limitations, they are not suitable for broad deployment. In this paper, we propose FO-Sketch, a more efficient and general network measurement service that meets the most stringent security requirements, especially for a large-scale network with heavy traffic volume and burst traffic. Let a mergeable sketch update the local flow statistics in each local switch; FO-Sketch merges (in an Intel SGX-created enclave) these sketches obliviously to form a global “one big sketch” in the cloud. With the help of Oblivious Shuffle, Divide and Conquer, and SIMD speedup, we optimize all of the critical routines in our FO-Sketch to make it 17.3x faster than a trivial oblivious solution. While keeping the same level of accuracy and packet processing throughput as non-oblivious Elastic Sketch, our FO-Sketch needs only ∼4.5 MB enclave memory space in total to record metrics and for PORAM to store the global sketch in the cloud. Extensive experiments demonstrate that, for the recommended setting, it takes only ∼ 0.6 s in total to rebuild those data during each measurement interval.



2018 ◽  
Vol 31 (5-6) ◽  
pp. 227-233
Author(s):  
Weitao Wang ◽  
◽  
Baoshan Wang ◽  
Xiufen Zheng ◽  


2008 ◽  
Vol 40 (7) ◽  
pp. 854-861 ◽  
Author(s):  
Jun Zhu ◽  
Bin Zhang ◽  
Erin N Smith ◽  
Becky Drees ◽  
Rachel B Brem ◽  
...  


Author(s):  
Olexander Melnikov ◽  
◽  
Konstantin Petrov ◽  
Igor Kobzev ◽  
Viktor Kosenko ◽  
...  

The article considers the development and implementation of cloud services in the work of government agencies. The classification of the choice of cloud service providers is offered, which can serve as a basis for decision making. The basics of cloud computing technology are analyzed. The COVID-19 pandemic has identified the benefits of cloud services in remote work Government agencies at all levels need to move to cloud infrastructure. Analyze the prospects of cloud computing in Ukraine as the basis of e-governance in development. This is necessary for the rapid provision of quality services, flexible, large-scale and economical technological base. The transfer of electronic information interaction in the cloud makes it possible to attract a wide range of users with relatively low material costs. Automation of processes and their transfer to the cloud environment make it possible to speed up the process of providing services, as well as provide citizens with minimal time to obtain certain information. The article also lists the risks that exist in the transition to cloud services and the shortcomings that may arise in the process of using them.



BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiujin Li ◽  
Hailiang Song ◽  
Zhe Zhang ◽  
Yunmao Huang ◽  
Qin Zhang ◽  
...  

Abstract Background With the emphasis on analysing genotype-by-environment interactions within the framework of genomic selection and genome-wide association analysis, there is an increasing demand for reliable tools that can be used to simulate large-scale genomic data in order to assess related approaches. Results We proposed a theory to simulate large-scale genomic data on genotype-by-environment interactions and added this new function to our developed tool GPOPSIM. Additionally, a simulated threshold trait with large-scale genomic data was also added. The validation of the simulated data indicated that GPOSPIM2.0 is an efficient tool for mimicking the phenotypic data of quantitative traits, threshold traits, and genetically correlated traits with large-scale genomic data while taking genotype-by-environment interactions into account. Conclusions This tool is useful for assessing genotype-by-environment interactions and threshold traits methods.



2019 ◽  
Author(s):  
Paul Thompson ◽  
Neda Jahanshad ◽  
Christopher R. K. Ching ◽  
Lauren Salminen ◽  
Sophia I Thomopoulos ◽  
...  

This review summarizes the last decade of work by the ENIGMA (Enhancing NeuroImaging Genetics through Meta Analysis) Consortium, a global alliance of over 1,400 scientists across 43 countries, studying the human brain in health and disease. Building on large-scale genetic studies that discovered the first robustly replicated genetic loci associated with brain metrics, ENIGMA has diversified into over 50 working groups (WGs), pooling worldwide data and expertise to answer fundamental questions in neuroscience, psychiatry, neurology, and genetics. Most ENIGMA WGs focus on specific psychiatric and neurological conditions, other WGs study normal variation due to sex and gender differences, or development and aging; still other WGs develop methodological pipelines and tools to facilitate harmonized analyses of “big data” (i.e., genetic and epigenetic data, multimodal MRI, and electroencephalography data). These international efforts have yielded the largest neuroimaging studies to date in schizophrenia, bipolar disorder, major depressive disorder, post-traumatic stress disorder, substance use disorders, obsessive-compulsive disorder, attention-deficit/hyperactivity disorder, autism spectrum disorders, epilepsy, and 22q11.2 deletion syndrome. More recent ENIGMA WGs have formed to study anxiety disorders, suicidal thoughts and behavior, sleep and insomnia, eating disorders, irritability, brain injury, antisocial personality and conduct disorder, and dissociative identity disorder. Here, we summarize the first decade of ENIGMA’s activities and ongoing projects, and describe the successes and challenges encountered along the way. We highlight the advantages of collaborative large-scale coordinated data analyses for testing reproducibility and robustness of findings, offering the opportunity to identify brain systems involved in clinical syndromes across diverse samples and associated genetic, environmental, demographic, cognitive and psychosocial factors.



2016 ◽  
Author(s):  
Hannah R. Dueck ◽  
Rizi Ai ◽  
Adrian Camarena ◽  
Bo Ding ◽  
Reymundo Dominguez ◽  
...  

AbstractRecently, measurement of RNA at single cell resolution has yielded surprising insights. Methods for single-cell RNA sequencing (scRNA-seq) have received considerable attention, but the broad reliability of single cell methods and the factors governing their performance are still poorly known. Here, we conducted a large-scale control experiment to assess the transfer function of three scRNA-seq methods and factors modulating the function. All three methods detected greater than 70% of the expected number of genes and had a 50% probability of detecting genes with abundance greater than 2 to 4 molecules. Despite the small number of molecules, sequencing depth significantly affected gene detection. While biases in detection and quantification were qualitatively similar across methods, the degree of bias differed, consistent with differences in molecular protocol. Measurement reliability increased with expression level for all methods and we conservatively estimate the measurement transfer functions to be linear above ~5-10 molecules. Based on these extensive control studies, we propose that RNA-seq of single cells has come of age, yielding quantitative biological information.



Sign in / Sign up

Export Citation Format

Share Document