scholarly journals Transparent exploration of machine learning for biomarker discovery from proteomics and omics data

2021 ◽  
Author(s):  
Furkan M. Torun ◽  
Sebastian Virreira Winter ◽  
Sophia Doll ◽  
Felix M. Riese ◽  
Artem Vorobyev ◽  
...  

AbstractBiomarkers are of central importance for assessing the health state and to guide medical interventions and their efficacy, but they are lacking for most diseases. Mass spectrometry (MS)-based proteomics is a powerful technology for biomarker discovery, but requires sophisticated bioinformatics to identify robust patterns. Machine learning (ML) has become indispensable for this purpose, however, it is sometimes applied in an opaque manner, generally requires expert knowledge and complex and expensive software. To enable easy access to ML for biomarker discovery without any programming or bioinformatic skills, we developed ‘OmicLearn’ (https://OmicLearn.com), an open-source web-based ML tool using the latest advances in the Python ML ecosystem. We host a web server for the exploration of the researcher’s results that can readily be cloned for internal use. Output tables from proteomics experiments are easily uploaded to the central or a local webserver. OmicLearn enables rapid exploration of the suitability of various ML algorithms for the experimental datasets. It fosters open science via transparent assessment of state-of-the-art algorithms in a standardized format for proteomics and other omics sciences.Graphical AbstractHighlightsOmicLearn is an open-source platform allows researchers to apply machine learning (ML) for biomarker discoveryThe ready-to-use structure of OmicLearn enables accessing state-of-the-art ML algorithms without requiring any prior bioinformatics knowledgeOmicLearn’s web-based interface provides an easy-to-follow platform for classification and gaining insights into the datasetSeveral algorithms and methods for preprocessing, feature selection, classification and cross-validation of omics datasets are integratedAll results, settings and method text can be exported in publication-ready formats

2018 ◽  
Vol XIX (1) ◽  
pp. 555-560
Author(s):  
Băutu E

In 2003, the Romanian National Institute of Meteorology and Hydrology inaugurated National Integrated Meteorological System (SIMIN), consisting of a network of stations and instruments for measurement and detection of hydro and meteorological data, a specialized communication network, a forecasting network, and a dissemination network. With a setup cost of $55 million and a national priority role, SIMIN (implemented by Lockheed Martin) is relatively black boxed even today, using proprietary technology and software. Few institutions have direct access to the data it provides. In this paper, we present the design of a web-based software application built on open source software that allows easy access to and processing of data available in SIMIN.


2019 ◽  
Author(s):  
Cameron T. Ellis ◽  
Christopher Baldassano ◽  
Anna C. Schapiro ◽  
Ming Bo Cai ◽  
Jonathan D. Cohen

AbstractBackgroundWith advances in methods for collecting and analyzing fMRI data, there is a concurrent need to understand how to reliably evaluate and optimally use these methods. Simulations of fMRI data can aid in both the evaluation of complex designs and the analysis of data.New MethodWe present fmrisim, a new Python package for standardized, realistic simulation of fMRI data. This package is part of BrainIAK: a recently released open-source Python toolbox for advanced neuroimaging analyses. We describe how to use fmrisim to extract noise properties from real fMRI data and then create a synthetic dataset with matched noise properties and a user-specified signal.ResultsWe validate the noise generated by fmrisim to show that it can approximate the noise properties of real data. We further show how fmrisim can help researchers find the optimal design in terms of power.Comparison with other methodsfmrisim ports the functionality of other packages to the Python platform while extending what is available in order to make it seamless to simulate realistic fMRI data.ConclusionsThe fmrisim package holds promise for improving the design of fMRI experiments, which may facilitate both the pre-registration of such experiments as well as the analysis of fMRI data.Highlightsfmrisim can simulate fMRI data matched to the noise properties of real fMRI.This can help researchers investigate the power of their fMRI designs.This also facilitates open science by making it easy to pre-register analysis pipelines.


2021 ◽  
Author(s):  
Kenneth Atz ◽  
Clemens Isert ◽  
Markus N. A. Böcker ◽  
José Jiménez-Luna ◽  
Gisbert Schneider

Certain molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like compounds currently makes large-scale applications of quantum chemistry challenging. In order to mitigate this problem, we developed DelFTa, an open-source toolbox for predicting small-molecule electronic properties at the density functional (DFT) level of theory, using the Δ-machine learning principle. DelFTa employs state-of-the-art E(3)-equivariant graph neural networks that were trained on the QMugs dataset of QM properties. It provides access to a wide array of quantum observables by predicting approximations to ωB97X-D/def2-SVP values from a GFN2-xTB semiempirical baseline. Δ-learning with DelFTa was shown to outperform direct DFT learning for most of the considered QM endpoints. The software is provided as open-source code with fully-documented command-line and Python APIs.


2021 ◽  
Author(s):  
Steven Hicks ◽  
Inga Strüke ◽  
Vajira Thambawita ◽  
Malek Hammou ◽  
Pål Halvorsen ◽  
...  

Clinicians and model developers need to understand how proposed machine learning (ML) models could improve patient care. In fact, no single metric captures all the desirable properties of a model and several metrics are typically reported to summarize a model's performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.


2020 ◽  
Author(s):  
Adriana Tomic ◽  
Ivan Tomic ◽  
Levi Waldron ◽  
Ludwig Geistlinger ◽  
Max Kuhn ◽  
...  

AbstractData analysis and knowledge discovery has become more and more important in biology and medicine with the increasing complexity of the biological datasets, but necessarily sophisticated programming skills and in-depth understanding of algorithms needed pose barriers to most biologists and clinicians to perform such research. We have developed a modular open-source software SIMON to facilitate the application of 180+ state-of-the-art machine learning algorithms to high-dimensional biomedical data. With an easy to use graphical user interface, standardized pipelines, automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data.


Author(s):  
Corey Horien ◽  
Stephanie Noble ◽  
Abigail Greene ◽  
Kangjoo Lee ◽  
Daniel Barron ◽  
...  

Large datasets that enable researchers to perform investigations with unprecedented rigor are growing increasingly common in neuroimaging. Due to the simultaneous increasing popularity of open science, these state-of-the-art datasets are more accessible than ever to researchers around the world. While analysis of these samples has pushed the field forward, they pose a new set of challenges that might cause difficulties for novice users. Here, we offer practical tips for working with large datasets from the end-user’s perspective. We cover all aspects of the data life cycle: from what to consider when downloading and storing the data, to tips on how to become acquainted with a dataset one did not collect, to what to share when communicating results. This manuscript serves as a practical guide one can use when working with large neuroimaging datasets, thus dissolving barriers to scientific discovery.


2018 ◽  
Vol 11 (2) ◽  
pp. 103-106
Author(s):  
Froilan D. Mobo FRIEdr

The Shipboard Training in the selected Maritime Institution uses a distributed documentation partially manual process in monitoring the performance of the cadets that most likely causes the delay and inconsistency of reports. Their Department uses any Social Media Website in assessing/validating the reports on the performance of the cadets. The Department of Shipboard Training receives a summary of the report through Social Media Website. Technology nowadays is overwhelming that resulted to change faster from a stand-alone system to a web based technology which is capable of supporting almost all of the computerized transactions using an open source mobile applet and Content Management Systems. Most of the organizations have embraced technology and have developed exceptional online programs that provide easy access and massive communication. These maritime schools entirely take after the IMO model courses as proclaimed by the 1978 tradition on Standards of Training, Certification and Watch keeping for seafarers (STCW), as altered in 1995, and are the main ones permitted by our legislature to lead and regulate baccalaureate courses with 3-year scholastics in addition to 1 year managed shipboard apprenticeship for deck and engine cadets.


Author(s):  
S. Vignesh Kandasamy ◽  
A. Madhu ◽  
P. K. Gupta ◽  
A. Niveditha ◽  
K. Bordoloi

<p><strong>Abstract.</strong> GIS and machine learning (ML) are powerful ICT tools in retail industry which helps the sellers understand their markets. For the consumers, however, there always lies an ambiguity with respect to the quality and quantity of the product to be purchased, vis-à-vis the price paid for it. Most retail businesses today adopt “Discount Pricing Strategies” or “Offers” to make new customers and increase sales. Owing to several establishments selling the same product and offering a variety of offers, the process of identifying the shops where the consumer can get the best value for his money, requires a lot of manual effort. A prototype has been developed in this study to allow the consumers to locate such prospective shops based on advertisements in newspapers. This solution has a two-pronged approach. First, all the offers advertised in the newspaper are pre-processed and text extraction is performed using a ML algorithm named Tesseract OCR. Second the location of shops is collected and stored in a geodatabase. Finally, the advertisement is matched to the respective geo-located shop based on its name and location. Further based on the location of the consumer and his purchase choice, shops offering discounts are shown on a web based map. This prototype provides the consumer, a platform for geo-discovery of establishments of interest through the clutter of unrelated endorsements, by the use of Open Source GIS, Python programming and ML techniques.</p>


Author(s):  
Francesco Pirotti

AbstractThis review aims at introducing laser scanning technology and providing an overview of the contribution of open source projects for supporting the utilization and analysis of laser scanning data. Lidar technology is pushing to new frontiers in mapping and surveying topographic data. The open source community has supported this by providing libraries, standards, interfaces, modules all the way to full software. Such open solutions provide scientists and end-users valuable tools to access and work with lidar data, fostering new cutting-edge investigation and improvements of existing methods.The first part of this work provides an introduction on laser scanning principles, with references for further reading. It is followed by sections respectively reporting on open standards and formats for lidar data, tools and finally web-based solutions for accessing lidar data. It is not intended to provide a thorough review of state of the art regarding lidar technology itself, but to provide an overview of the open source toolkits available to the community to access, visualize, edit and process point clouds. A range of open source features for lidar data access and analysis is provided, providing an overview of what can be done with alternatives to commercial end-to-end solutions. Data standards and formats are also discussed, showing what are the challenges for storing and accessing massive point clouds.The desiderata are to provide scientists that have not yet worked with lidar data an overview of how this technology works and what open source tools can be a valid solution for their needs in analysing such data. Researchers that are already involved with lidar data will hopefully get ideas on integrating and improving their workflow through open source solutions.


Sign in / Sign up

Export Citation Format

Share Document