Transparent exploration of machine learning for biomarker discovery from proteomics and omics data

AbstractBiomarkers are of central importance for assessing the health state and to guide medical interventions and their efficacy, but they are lacking for most diseases. Mass spectrometry (MS)-based proteomics is a powerful technology for biomarker discovery, but requires sophisticated bioinformatics to identify robust patterns. Machine learning (ML) has become indispensable for this purpose, however, it is sometimes applied in an opaque manner, generally requires expert knowledge and complex and expensive software. To enable easy access to ML for biomarker discovery without any programming or bioinformatic skills, we developed ‘OmicLearn’ (https://OmicLearn.com), an open-source web-based ML tool using the latest advances in the Python ML ecosystem. We host a web server for the exploration of the researcher’s results that can readily be cloned for internal use. Output tables from proteomics experiments are easily uploaded to the central or a local webserver. OmicLearn enables rapid exploration of the suitability of various ML algorithms for the experimental datasets. It fosters open science via transparent assessment of state-of-the-art algorithms in a standardized format for proteomics and other omics sciences.Graphical AbstractHighlightsOmicLearn is an open-source platform allows researchers to apply machine learning (ML) for biomarker discoveryThe ready-to-use structure of OmicLearn enables accessing state-of-the-art ML algorithms without requiring any prior bioinformatics knowledgeOmicLearn’s web-based interface provides an easy-to-follow platform for classification and gaining insights into the datasetSeveral algorithms and methods for preprocessing, feature selection, classification and cross-validation of omics datasets are integratedAll results, settings and method text can be exported in publication-ready formats

Download Full-text

Osimin - A meteorological data platform for processing SIMIN data built on open source software

Scientific Bulletin of Naval Academy ◽

10.21279/1454-864x-18-i1-083 ◽

2018 ◽

Vol XIX (1) ◽

pp. 555-560

Author(s):

Băutu E

Keyword(s):

Open Source ◽

Open Source Software ◽

Meteorological Data ◽

Direct Access ◽

Easy Access ◽

Setup Cost ◽

Web Based ◽

Software Application ◽

National Priority ◽

Data Platform

In 2003, the Romanian National Institute of Meteorology and Hydrology inaugurated National Integrated Meteorological System (SIMIN), consisting of a network of stations and instruments for measurement and detection of hydro and meteorological data, a specialized communication network, a forecasting network, and a dissemination network. With a setup cost of $55 million and a national priority role, SIMIN (implemented by Lockheed Martin) is relatively black boxed even today, using proprietary technology and software. Few institutions have direct access to the data it provides. In this paper, we present the design of a web-based software application built on open source software that allows easy access to and processing of data available in SIMIN.

Download Full-text

Facilitating open-science with realistic fMRI simulation: validation and application

10.1101/532424 ◽

2019 ◽

Cited By ~ 3

Author(s):

Cameron T. Ellis ◽

Christopher Baldassano ◽

Anna C. Schapiro ◽

Ming Bo Cai ◽

Jonathan D. Cohen

Keyword(s):

Optimal Design ◽

Open Source ◽

Real Data ◽

Open Science ◽

Synthetic Dataset ◽

Fmri Data ◽

List Type ◽

Simulation Validation ◽

Python Package

AbstractBackgroundWith advances in methods for collecting and analyzing fMRI data, there is a concurrent need to understand how to reliably evaluate and optimally use these methods. Simulations of fMRI data can aid in both the evaluation of complex designs and the analysis of data.New MethodWe present fmrisim, a new Python package for standardized, realistic simulation of fMRI data. This package is part of BrainIAK: a recently released open-source Python toolbox for advanced neuroimaging analyses. We describe how to use fmrisim to extract noise properties from real fMRI data and then create a synthetic dataset with matched noise properties and a user-specified signal.ResultsWe validate the noise generated by fmrisim to show that it can approximate the noise properties of real data. We further show how fmrisim can help researchers find the optimal design in terms of power.Comparison with other methodsfmrisim ports the functionality of other packages to the Python platform while extending what is available in order to make it seamless to simulate realistic fMRI data.ConclusionsThe fmrisim package holds promise for improving the design of fMRI experiments, which may facilitate both the pre-registration of such experiments as well as the analysis of fMRI data.Highlightsfmrisim can simulate fMRI data matched to the noise properties of real fMRI.This can help researchers investigate the power of their fMRI designs.This also facilitates open science by making it easy to pre-register analysis pipelines.

Download Full-text

Open-source Δ-quantum machine learning for medicinal chemistry

10.33774/chemrxiv-2021-fz6v7 ◽

2021 ◽

Author(s):

Kenneth Atz ◽

Clemens Isert ◽

Markus N. A. Böcker ◽

José Jiménez-Luna ◽

Gisbert Schneider

Keyword(s):

Machine Learning ◽

Open Source ◽

Density Functional ◽

Large Scale ◽

Molecular Design ◽

State Of The Art ◽

Computational Cost ◽

Quantum Mechanical ◽

Quantum Observables ◽

Graph Neural Networks

Certain molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like compounds currently makes large-scale applications of quantum chemistry challenging. In order to mitigate this problem, we developed DelFTa, an open-source toolbox for predicting small-molecule electronic properties at the density functional (DFT) level of theory, using the Δ-machine learning principle. DelFTa employs state-of-the-art E(3)-equivariant graph neural networks that were trained on the QMugs dataset of QM properties. It provides access to a wide array of quantum observables by predicting approximations to ωB97X-D/def2-SVP values from a GFN2-xTB semiempirical baseline. Δ-learning with DelFTa was shown to outperform direct DFT learning for most of the considered QM endpoints. The software is provided as open-source code with fully-documented command-line and Python APIs.

Download Full-text

On evaluation metrics for medical applications of artificial intelligence

10.1101/2021.04.07.21254975 ◽

2021 ◽

Author(s):

Steven Hicks ◽

Inga Strüke ◽

Vajira Thambawita ◽

Malek Hammou ◽

Pål Halvorsen ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Patient Care ◽

Open Source ◽

Performance Metrics ◽

Improve Patient Care ◽

Evaluation Metrics ◽

Medical Applications ◽

Web Based ◽

Improve Patient

Clinicians and model developers need to understand how proposed machine learning (ML) models could improve patient care. In fact, no single metric captures all the desirable properties of a model and several metrics are typically reported to summarize a model's performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.

Download Full-text

SIMON: open-source knowledge discovery platform

10.1101/2020.08.16.252767 ◽

2020 ◽

Author(s):

Adriana Tomic ◽

Ivan Tomic ◽

Levi Waldron ◽

Ludwig Geistlinger ◽

Max Kuhn ◽

...

Keyword(s):

Machine Learning ◽

Knowledge Discovery ◽

Open Source ◽

Graphical User Interface ◽

Open Source Software ◽

State Of The Art ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Biomedical Data ◽

Programming Skills

AbstractData analysis and knowledge discovery has become more and more important in biology and medicine with the increasing complexity of the biological datasets, but necessarily sophisticated programming skills and in-depth understanding of algorithms needed pose barriers to most biologists and clinicians to perform such research. We have developed a modular open-source software SIMON to facilitate the application of 180+ state-of-the-art machine learning algorithms to high-dimensional biomedical data. With an easy to use graphical user interface, standardized pipelines, automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data.

Download Full-text

A Hitchhiker’s Guide to Working with Large, Open-Source Neuroimaging Datasets

10.20944/preprints202007.0153.v1 ◽

2020 ◽

Author(s):

Corey Horien ◽

Stephanie Noble ◽

Abigail Greene ◽

Kangjoo Lee ◽

Daniel Barron ◽

...

Keyword(s):

Life Cycle ◽

Open Source ◽

State Of The Art ◽

Scientific Discovery ◽

Open Science ◽

Large Datasets ◽

End User ◽

Data Life Cycle ◽

The World ◽

Novice Users

Large datasets that enable researchers to perform investigations with unprecedented rigor are growing increasingly common in neuroimaging. Due to the simultaneous increasing popularity of open science, these state-of-the-art datasets are more accessible than ever to researchers around the world. While analysis of these samples has pushed the field forward, they pose a new set of challenges that might cause difficulties for novice users. Here, we offer practical tips for working with large datasets from the end-user’s perspective. We cover all aspects of the data life cycle: from what to consider when downloading and storing the data, to tips on how to become acquainted with a dataset one did not collect, to what to share when communicating results. This manuscript serves as a practical guide one can use when working with large neuroimaging datasets, thus dissolving barriers to scientific discovery.

Download Full-text

Effectiveness of Social Media in Monitoring Cadets’ Performance on Shipboard Training in Selected Maritime Schools Using System Quality Metrics

Oriental journal of computer science and technology ◽

10.13005/ojcst11.02.05 ◽

2018 ◽

Vol 11 (2) ◽

pp. 103-106

Author(s):

Froilan D. Mobo FRIEdr

Keyword(s):

Social Media ◽

Open Source ◽

Content Management ◽

Management Systems ◽

Easy Access ◽

System Quality ◽

Online Programs ◽

Web Based ◽

Almost All ◽

Content Management Systems

The Shipboard Training in the selected Maritime Institution uses a distributed documentation partially manual process in monitoring the performance of the cadets that most likely causes the delay and inconsistency of reports. Their Department uses any Social Media Website in assessing/validating the reports on the performance of the cadets. The Department of Shipboard Training receives a summary of the report through Social Media Website. Technology nowadays is overwhelming that resulted to change faster from a stand-alone system to a web based technology which is capable of supporting almost all of the computerized transactions using an open source mobile applet and Content Management Systems. Most of the organizations have embraced technology and have developed exceptional online programs that provide easy access and massive communication. These maritime schools entirely take after the IMO model courses as proclaimed by the 1978 tradition on Standards of Training, Certification and Watch keeping for seafarers (STCW), as altered in 1995, and are the main ones permitted by our legislature to lead and regulate baccalaureate courses with 3-year scholastics in addition to 1 year managed shipboard apprenticeship for deck and engine cadets.

Download Full-text

LOCATION BASED ADVERTISING FOR MASS MARKETING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-5-189-2018 ◽

2018 ◽

Vol XLII-5 ◽

pp. 189-192

Author(s):

S. Vignesh Kandasamy ◽

A. Madhu ◽

P. K. Gupta ◽

A. Niveditha ◽

K. Bordoloi

Keyword(s):

Machine Learning ◽

Open Source ◽

Retail Industry ◽

Pricing Strategies ◽

Web Based ◽

Best Value ◽

Open Source Gis ◽

Ict Tools ◽

Python Programming ◽

Mass Marketing

<p><strong>Abstract.</strong> GIS and machine learning (ML) are powerful ICT tools in retail industry which helps the sellers understand their markets. For the consumers, however, there always lies an ambiguity with respect to the quality and quantity of the product to be purchased, vis-à-vis the price paid for it. Most retail businesses today adopt “Discount Pricing Strategies” or “Offers” to make new customers and increase sales. Owing to several establishments selling the same product and offering a variety of offers, the process of identifying the shops where the consumer can get the best value for his money, requires a lot of manual effort. A prototype has been developed in this study to allow the consumers to locate such prospective shops based on advertisements in newspapers. This solution has a two-pronged approach. First, all the offers advertised in the newspaper are pre-processed and text extraction is performed using a ML algorithm named Tesseract OCR. Second the location of shops is collected and stored in a geodatabase. Finally, the advertisement is matched to the respective geo-located shop based on its name and location. Further based on the location of the consumer and his purchase choice, shops offering discounts are shown on a web based map. This prototype provides the consumer, a platform for geo-discovery of establishments of interest through the clutter of unrelated endorsements, by the use of Open Source GIS, Python programming and ML techniques.</p>

Download Full-text

Open software and standards in the realm of laser scanning technology

Open Geospatial Data Software and Standards ◽

10.1186/s40965-019-0073-z ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 2

Author(s):

Francesco Pirotti

Keyword(s):

Open Source ◽

Laser Scanning ◽

State Of The Art ◽

Data Access ◽

Point Clouds ◽

Lidar Data ◽

Data Standards ◽

Web Based ◽

Massive Point ◽

Valid Solution

AbstractThis review aims at introducing laser scanning technology and providing an overview of the contribution of open source projects for supporting the utilization and analysis of laser scanning data. Lidar technology is pushing to new frontiers in mapping and surveying topographic data. The open source community has supported this by providing libraries, standards, interfaces, modules all the way to full software. Such open solutions provide scientists and end-users valuable tools to access and work with lidar data, fostering new cutting-edge investigation and improvements of existing methods.The first part of this work provides an introduction on laser scanning principles, with references for further reading. It is followed by sections respectively reporting on open standards and formats for lidar data, tools and finally web-based solutions for accessing lidar data. It is not intended to provide a thorough review of state of the art regarding lidar technology itself, but to provide an overview of the open source toolkits available to the community to access, visualize, edit and process point clouds. A range of open source features for lidar data access and analysis is provided, providing an overview of what can be done with alternatives to commercial end-to-end solutions. Data standards and formats are also discussed, showing what are the challenges for storing and accessing massive point clouds.The desiderata are to provide scientists that have not yet worked with lidar data an overview of how this technology works and what open source tools can be a valid solution for their needs in analysing such data. Researchers that are already involved with lidar data will hopefully get ideas on integrating and improving their workflow through open source solutions.

Download Full-text

Open source, web-based IAT

PsycEXTRA Dataset ◽

10.1037/e527772014-214 ◽

2011 ◽

Author(s):

Winter Mason

Keyword(s):

Open Source ◽

Web Based

Download Full-text