EpiMetal: an open-source graphical web browser tool for easy statistical analyses in epidemiology and metabolomics

Abstract Motivation An intuitive graphical interface that allows statistical analyses and visualizations of extensive data without any knowledge of dedicated statistical software or programming. Implementation EpiMetal is a single-page web application written in JavaScript, to be used via a modern desktop web browser. General features Standard epidemiological analyses and self-organizing maps for data-driven metabolic profiling are included. Multiple extensive datasets with an arbitrary number of continuous and category variables can be integrated with the software. Any snapshot of the analyses can be saved and shared with others via a www-link. We demonstrate the usage of EpiMetal using pilot data with over 500 quantitative molecular measures for each sample as well as in two large-scale epidemiological cohorts (N >10 000). Availability The software usage exemplar and the pilot data are open access online at [http://EpiMetal.computationalmedicine.fi]. MIT licensed source code is available at the Github repository at [https://github.com/amergin/epimetal].

Download Full-text

epiTAD: a web application for visualizing chromosome conformation capture data in the context of genetic epidemiology

Bioinformatics ◽

10.1093/bioinformatics/btz387 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4462-4464

Author(s):

Jordan H Creed ◽

Garrick Aden-Buie ◽

Alvaro N Monteiro ◽

Travis A Gerke

Keyword(s):

Genetic Epidemiology ◽

In Silico ◽

Web Application ◽

Large Scale ◽

Source Code ◽

Chromosome Conformation Capture ◽

Chromosome Conformation ◽

Web Based ◽

Genome Wide ◽

Public Data

Abstract Summary Complementary advances in genomic technology and public data resources have created opportunities for researchers to conduct multifaceted examination of the genome on a large scale. To meet the need for integrative genome wide exploration, we present epiTAD. This web-based tool enables researchers to compare genomic 3D organization and annotations across multiple databases in an interactive manner to facilitate in silico discovery. Availability and implementation epiTAD can be accessed at https://apps.gerkelab.com/epiTAD/ where we have additionally made publicly available the source code and a Docker containerized version of the application.

Download Full-text

InterARTIC: an interactive web application for whole-genome nanopore sequencing analysis of SARS-CoV-2 and other viruses

10.1101/2021.04.21.440861 ◽

2021 ◽

Author(s):

James M. Ferguson ◽

Hasindu Gamaarachchi ◽

Thanh Nguyen ◽

Alyne Gollon ◽

Stephanie Tong ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Ebola Virus ◽

Job Scheduling ◽

Whole Genome ◽

Sequencing Analysis ◽

Graphical Interface ◽

Laptop Computer ◽

Oxford Nanopore ◽

Viral Surveillance

1.ABSTRACTMotivationInterARTIC is an interactive web application for the analysis of viral whole-genome sequencing (WGS) data generated on Oxford Nanopore Technologies (ONT) devices. A graphical interface enables users with no bioinformatics expertise to analyse WGS experiments and reconstruct consensus genome sequences from individual isolates of viruses, such as SARS-CoV-2. InterARTIC is intended to facilitate widespread adoption and standardisation of ONT sequencing for viral surveillance and molecular epidemiology.Worked exampleWe demonstrate the use of InterARTIC for the analysis of ONT viral WGS data from SARS-CoV-2 and Ebola virus, using a laptop computer or the internal computer on an ONT GridlON sequencing device. We showcase the intuitive graphical interface, workflow customization capabilities and job-scheduling system that facilitate execution of small- and large-scale WGS projects on any virus.Implementation & availabilityInterARTIC is a free, open-source web application implemented in Python. The application can be downloaded as a set of pre-compiled binaries that are compatible with all common Ubuntu distributions or built from source. For further details please visit: https://github.com/Psy-Fer/interARTIC/.

Download Full-text

Phandango: an interactive viewer for bacterial population genomics

10.1101/119545 ◽

2017 ◽

Cited By ~ 13

Author(s):

James Hadfield ◽

Nicholas J. Croucher ◽

Richard J Goater ◽

Khalil Abudahab ◽

David M Aanensen ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Bacterial Population ◽

Population Genomics ◽

Genomic Analysis ◽

Evolutionary Analysis ◽

Base Pairs ◽

Web Browser ◽

Link Type ◽

Scale Population

ABSTRACTSummaryFully exploiting the wealth of data in current bacterial population genomics datasets requires synthesising and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.AvailabilityPhandango is a web application freely available for use at https://jameshadfield.github.io/phandango and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/[email protected], [email protected]

Download Full-text

Design and Pilot data of the high myopia registration study: Shanghai Child and Adolescent Large‐scale Eye Study (SCALE‐HM)

Acta Ophthalmologica ◽

10.1111/aos.14617 ◽

2020 ◽

Author(s):

Xiangui He ◽

Junjie Deng ◽

Xian Xu ◽

Jingjing Wang ◽

Tianyu Cheng ◽

...

Keyword(s):

High Myopia ◽

Large Scale ◽

Child And Adolescent ◽

Pilot Data

Download Full-text

Geospatial Analysis of Extreme Weather Events in Nigeria (1985–2015) Using Self-Organizing Maps

Advances in Meteorology ◽

10.1155/2017/8576150 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 9

Author(s):

Adeoluwa Akande ◽

Ana Cristina Costa ◽

Jorge Mateu ◽

Roberto Henriques

Keyword(s):

Extreme Precipitation ◽

Large Scale ◽

Northern Region ◽

Extreme Weather Events ◽

Self Organizing Map ◽

Spatial And Temporal Patterns ◽

Self Organizing Maps ◽

The North ◽

Precipitation Regimes ◽

Self Organizing

The explosion of data in the information age has provided an opportunity to explore the possibility of characterizing the climate patterns using data mining techniques. Nigeria has a unique tropical climate with two precipitation regimes: low precipitation in the north leading to aridity and desertification and high precipitation in parts of the southwest and southeast leading to large scale flooding. In this research, four indices have been used to characterize the intensity, frequency, and amount of rainfall over Nigeria. A type of Artificial Neural Network called the self-organizing map has been used to reduce the multiplicity of dimensions and produce four unique zones characterizing extreme precipitation conditions in Nigeria. This approach allowed for the assessment of spatial and temporal patterns in extreme precipitation in the last three decades. Precipitation properties in each cluster are discussed. The cluster closest to the Atlantic has high values of precipitation intensity, frequency, and duration, whereas the cluster closest to the Sahara Desert has low values. A significant increasing trend has been observed in the frequency of rainy days at the center of the northern region of Nigeria.

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19 ◽

10.1145/3357492.3358632 ◽

2019 ◽

Author(s):

Bin Wang ◽

Cy Chan ◽

Divya Somasi ◽

Jane Macfarlane ◽

Eric Rask

Keyword(s):

Large Scale ◽

Energy Use ◽

Transportation Networks ◽

Data Driven

Download Full-text

Improving the management of type 2 diabetes through large-scale general practice: the role of a data-driven and technology-enabled education programme

BMJ Open Quality ◽

10.1136/bmjoq-2020-001087 ◽

2021 ◽

Vol 10 (1) ◽

pp. e001087

Author(s):

Tarek F Radwan ◽

Yvette Agyako ◽

Alireza Ettefaghian ◽

Tahira Kamran ◽

Omar Din ◽

...

Keyword(s):

Type 2 Diabetes ◽

Primary Care ◽

Large Scale ◽

Education Programme ◽

Educational Programme ◽

Data Driven ◽

Treatment Targets ◽

Care Processes ◽

Data Driven Approach

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.

Download Full-text