scholarly journals Swarm: A federated cloud framework for large-scale variant analysis

2021 ◽  
Vol 17 (5) ◽  
pp. e1008977
Author(s):  
Amir Bahmani ◽  
Kyle Ferriter ◽  
Vandhana Krishnan ◽  
Arash Alavi ◽  
Amir Alavi ◽  
...  

Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.

2016 ◽  
Author(s):  
Kaiheng Hu ◽  
Pu Li ◽  
Yong You ◽  
Fenghuan Su

Abstract. A hydrologically based model is developed for delineating hazard zones in valleys of debris flow basins. The basic assumption of this model is that the ratio of peak discharges of any two cross sections in a debris-flow basin is a power function of the ratio of their flow accumulation areas. Combining the advantages of the empirical and flow routing models of debris-flow hazard zoning, this hydrological model with minimal data requirements has the ability to produce hazard intensity values at different event magnitudes. The algorithms used in this model are designed in the framework of grid- based geographic processing and implemented completely on ArcGIS platform and a Python scripting environment. Qipan basin in the Wenchuan county of Sichuan province, southwest China where a large-scale debris-flow event occurred on July 11, 2013 was chosen as the test case for the model. The hazard zone identified by the model showed good agreement with the real inundation area of the event. The proposed method can help identify small hazard areas in upstream tributaries and the developed model is promising in terms of its application in debris-flow hazard zoning.


2020 ◽  
Author(s):  
Albert A Gayle

Year-to-year emergence of West Nile virus has been sporadic and notoriously hard to predict. In Europe, 2018 saw a dramatic increase in the number of cases and locations affected. In this work, we demonstrate a novel method for predicting outbreaks and understanding what drives them. This method creates a simple model for each region that directly explains how each variable affects risk. Behind the scenes, each local explanation model is produced by a state-of-the-art AI engine. This engine unpacks and restructures output from an XGBoost machine learning ensemble. XGBoost, well-known for its predictive accuracy, has always been considered a "black box" system. Not any more. With only minimal data curation and no "tuning", our model predicted where the 2018 outbreak would occur with an AUC of 97%. This model was trained using data from 2010-2016 that reflected many domains of knowledge. Climate, sociodemographic, economic, and biodiversity data were all included. Our model furthermore explained the specific drivers of the 2018 outbreak for each affected region. These effect predictions were found to be consistent with the research literature in terms of priority, direction, magnitude, and size of effect. Aggregation and statistical analysis of local effects revealed strong cross-scale interactions. From this, we concluded that the 2018 outbreak was driven by large-scale climatic anomalies enhancing the local effect of mosquito vectors. We also identified substantial areas across Europe at risk for sudden outbreak, similar to that experienced in 2018. Taken as a whole, these findings highlight the role of climate in the emergence and transmission of West Nile virus. Furthermore, they demonstrate the crucial role that the emerging "eXplainable AI" (XAI) paradigm will have in predicting and controlling disease.


2021 ◽  
Vol 18 ◽  
pp. 569-580
Author(s):  
Kateryna Kraus ◽  
Nataliia Kraus ◽  
Oleksandr Manzhura

The purpose of the research is to present the features of digitization of business processes in enterprises as a foundation on which the gradual formation of Industry 4.0 and the search for economic growth in new virtual reality, which has every chance to be a decisive step in implementing digital strategy for Ukraine and development of the innovation ecosystem. Key problems that arise during the digitalization of business processes in enterprises are presented, among which are: the historical orientation of production to mass, “running” sizes and large batches; large-scale production load; the complexity of cooperation and logic between production sites. It is determined that high-quality and effective tools of innovation-digital transformation in the conditions of virtual reality should include: a single system of on-line order management for all enterprises (application registration – technical expertise – planning – performance control – shipment); Smart Factory, Predictive Maintenance, IIoT, CRM, SCM. Features of digital transformation in the part of formation of enterprises of the ecosystem of Industry 4.0 are revealed. The capabilities and benefits of using Azure cloud platform in enterprises, which includes more than 200 products and cloud services, are analyzed. Azure is said to support open source technologies, so businesses have the ability to use tools and technologies they prefer and are more useful. After conducting a thorough analysis of the acceleration of deep digitalization of business processes by enterprises, authors proposed to put into practice Aruba solution for tracking contacts in the fight against COVID-19. Aruba technology helps locate, allowing you to implement flexible solutions based on Aruba Partner Ecosystem using a USB interface. It is proposed to use SYNTEGRA – a data integration service that provides interactive analytics and provides data models and dashboards in order to accelerate the modernization of data storage and management, optimize reporting in the company and obtain real-time analytics. The possibilities of using Azure cloud platform during the digitization of business processes of enterprises of the ecosystem of Industry 4.0 in the conditions of virtual reality are determined.


Author(s):  
Sen Zhao ◽  
Oleg Agafonov ◽  
Abdulrahman Azab ◽  
Tomasz Stokowy ◽  
Eivind Hovig

AbstractAdvances in next-generation sequencing technology has enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data, however there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN™ and DeepVariant) using Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN™ and DeepVariant show a better accuracy in SNPs and indels calling, with no significant differences in their F1-score. DRAGEN™ platform offers accuracy, flexibility and a highly-efficient running speed, and therefore superior advantage in the analysis of WGS data on a large scale. The combination of DRAGEN™ and DeepVariant also provides a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical application.


2021 ◽  
Author(s):  
Erik Gustafson ◽  
Burt Holzman ◽  
James Kowalkowski ◽  
Henry Lamm ◽  
Andy C. Y. Li ◽  
...  

2015 ◽  
pp. 2022-2032
Author(s):  
Bina Ramamurthy

In this chapter, the author examines the various approaches taken by the popular cloud providers Amazon Web Services (AWS), Google App Engine (GAE), and Windows Azure (Azure) to secure the cloud. AWS offers Infrastructure as a Service model, GAE is representative of the Software as a Service, and Azure represents the Platform as a Service model. Irrespective of the model, a cloud provider offers a variety of services from a simple large-scale storage service to a complete infrastructure for supporting the operations of a modern business. The author discusses some of the security aspects that a cloud customer must be aware of in selecting a cloud service provider for their needs. This discussion includes the major threats posed by multi-tenancy in the cloud. Another important aspect to consider in the security context is machine virtualization. Securing these services involves a whole range of measures from access-point protection at the client end to securing virtual co-tenants on the same physical machine hosted by a cloud. In this chapter, the author highlights the major offerings of the three cloud service providers mentioned above. She discusses the details of some important security challenges and solutions and illustrates them using screen shots of representative security configurations.


2020 ◽  
Vol 10 (3) ◽  
pp. 1-16
Author(s):  
Sanjay P. Ahuja ◽  
Emily Czarnecki ◽  
Sean Willison

Cloud computing has rapidly become a viable competitor to on-premise infrastructure from both management and cost perspectives. This research provides insight into cluster computing performance and variability in cloud-provisioned infrastructure from two popular public cloud providers. A comparative examination of the two cloud platforms using synthetic benchmarks is provided. In this article, we compared the performance of Amazon Web Services Elastic Compute Cluster (EC2) to the Google Cloud Platform (GCP) Compute Engine using three benchmarks: STREAM, IOR, and NPB-EP. Experiments were conducted on clusters with increasing nodes from one to eight. We also performed experiments over the course of two weeks where benchmarks were run at similar times. The benchmarks provided performance metrics for bandwidth (STREAM), read and write performance (IOR), and operations per second (NPB-EP). We found that EC2 outperformed GCP for bandwidth. Both provided good scalability and reliability for bandwidth with GCP showing a slight deviation during the two-week trial. GCP outperformed EC2 in both the read and write tests (IOR) as well as the operations per second test. However, GCP was extremely variable during the read and write tests over the two-week trial. Overall, each platform excelled in different benchmarks and we found EC2 to be more reliable in general.


2020 ◽  
Vol 27 (9) ◽  
pp. 1425-1430
Author(s):  
Inès Krissaane ◽  
Carlos De Niz ◽  
Alba Gutiérrez-Sacristán ◽  
Gabor Korodi ◽  
Nneka Ede ◽  
...  

Abstract Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. Results Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. Conclusions We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?


2015 ◽  
Vol 32 (6) ◽  
pp. 937-939 ◽  
Author(s):  
Kun Yang ◽  
Giovanni Stracquadanio ◽  
Jingchuan Luo ◽  
Jef D. Boeke ◽  
Joel S. Bader

Abstract Summary: Combinatorial assembly of DNA elements is an efficient method for building large-scale synthetic pathways from standardized, reusable components. These methods are particularly useful because they enable assembly of multiple DNA fragments in one reaction, at the cost of requiring that each fragment satisfies design constraints. We developed BioPartsBuilder as a biologist-friendly web tool to design biological parts that are compatible with DNA combinatorial assembly methods, such as Golden Gate and related methods. It retrieves biological sequences, enforces compliance with assembly design standards and provides a fabrication plan for each fragment. Availability and implementation: BioPartsBuilder is accessible at http://public.biopartsbuilder.org and an Amazon Web Services image is available from the AWS Market Place (AMI ID: ami-508acf38). Source code is released under the MIT license, and available for download at https://github.com/baderzone/biopartsbuilder. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1225 ◽  
Author(s):  
Mario Vega-Barbas ◽  
Jose Diaz-Olivares ◽  
Ke Lu ◽  
Mikael Forsman ◽  
Fernando Seoane ◽  
...  

Preventive healthcare has attracted much attention recently. Improving people’s lifestyles and promoting a healthy diet and wellbeing are important, but the importance of work-related diseases should not be undermined. Musculoskeletal disorders (MSDs) are among the most common work-related health problems. Ergonomists already assess MSD risk factors and suggest changes in workplaces. However, existing methods are mainly based on visual observations, which have a relatively low reliability and cover only part of the workday. These suggestions concern the overall workplace and the organization of work, but rarely includes individuals’ work techniques. In this work, we propose a precise and pervasive ergonomic platform for continuous risk assessment. The system collects data from wearable sensors, which are synchronized and processed by a mobile computing layer, from which exposure statistics and risk assessments may be drawn, and finally, are stored at the server layer for further analyses at both individual and group levels. The platform also enables continuous feedback to the worker to support behavioral changes. The deployed cloud platform in Amazon Web Services instances showed sufficient system flexibility to affordably fulfill requirements of small to medium enterprises, while it is expandable for larger corporations. The system usability scale of 76.6 indicates an acceptable grade of usability.


Sign in / Sign up

Export Citation Format

Share Document