Machine learning & deep learning in data-driven decision making of drug discovery and challenges in high-quality data acquisition in the pharmaceutical industry

Author(s):  
Sethu Arun Kumar ◽  
Thirumoorthy Durai Ananda Kumar ◽  
Narasimha M Beeraka ◽  
Gurubasavaraj Veeranna Pujar ◽  
Manisha Singh ◽  
...  

Predicting novel small molecule bioactivities for the target deconvolution, hit-to-lead optimization in drug discovery research, requires molecular representation. Previous reports have demonstrated that machine learning (ML) and deep learning (DL) have substantial implications in virtual screening, peptide synthesis, drug ADMET screening and biomarker discovery. These strategies can increase the positive outcomes in the drug discovery process without false-positive rates and can be achieved in a cost-effective way with a minimum duration of time by high-quality data acquisition. This review substantially discusses the recent updates in AI tools as cheminformatics application in medicinal chemistry for the data-driven decision making of drug discovery and challenges in high-quality data acquisition in the pharmaceutical industry while improving small-molecule bioactivities and properties.

2020 ◽  
Author(s):  
Guy M. Hagen ◽  
Justin Bendesky ◽  
Rosa Machado ◽  
Tram-Anh Nguyen ◽  
Tanmay Kumar ◽  
...  

AbstractBackgroundFluorescence microscopy is an important technique in many areas of biological research. Two factors which limit the usefulness and performance of fluorescence microscopy are photobleaching of fluorescent probes during imaging, and, when imaging live cells, phototoxicity caused by light exposure. Recently developed methods in machine learning are able to greatly improve the signal to noise ratio of acquired images. This allows researchers to record images with much shorter exposure times, which in turn minimizes photobleaching and phototoxicity by reducing the dose of light reaching the sample.FindingsTo employ deep learning methods, a large amount of data is needed to train the underlying convolutional neural network. One way to do this involves use of pairs of fluorescence microscopy images acquired with long and short exposure times. We provide high quality data sets which can be used to train and evaluate deep learning methods under development.ConclusionThe availability of high quality data is vital for training convolutional neural networks which are used in current machine learning approaches.


Processes ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 649
Author(s):  
Yifeng Liu ◽  
Wei Zhang ◽  
Wenhao Du

Deep learning based on a large number of high-quality data plays an important role in many industries. However, deep learning is hard to directly embed in the real-time system, because the data accumulation of the system depends on real-time acquisitions. However, the analysis tasks of such systems need to be carried out in real time, which makes it impossible to complete the analysis tasks by accumulating data for a long time. In order to solve the problems of high-quality data accumulation, high timeliness of the data analysis, and difficulty in embedding deep-learning algorithms directly in real-time systems, this paper proposes a new progressive deep-learning framework and conducts experiments on image recognition. The experimental results show that the proposed framework is effective and performs well and can reach a conclusion similar to the deep-learning framework based on large-scale data.


2020 ◽  
Vol 22 ◽  
pp. 145-160
Author(s):  
Darío Tilves Santiago ◽  
Carmén García Mateo ◽  
Soledad Torres Guijarro ◽  
Laura Docío Fernández ◽  
José Luis Alba Castro

Automatic sign language recognition (ASLR) is quite a complex task, not only for the difficulty of dealing with very dynamic video information, but also because almost every sign language (SL) can be considered as an under-resourced language when it comes to language technology. Spanish sign language (LSE) is one of those under-resourced languages. Developing technology for SSL implies a number of technical challenges that must be tackled down in a structured and sequential manner. In this paper, some problems of machine-learning- based ASLR are addressed. A review of publicly available datasets is given and a new one is presented. It is also discussed the current annotations methods and annotation programs. In our review of existing datasets, our main conclusion is that there is a need for more with high-quality data and annotations.


2020 ◽  
Vol 4 (4) ◽  
pp. 354-359
Author(s):  
Ari Ercole ◽  
Vibeke Brinck ◽  
Pradeep George ◽  
Ramona Hicks ◽  
Jilske Huijben ◽  
...  

AbstractBackground:High-quality data are critical to the entire scientific enterprise, yet the complexity and effort involved in data curation are vastly under-appreciated. This is especially true for large observational, clinical studies because of the amount of multimodal data that is captured and the opportunity for addressing numerous research questions through analysis, either alone or in combination with other data sets. However, a lack of details concerning data curation methods can result in unresolved questions about the robustness of the data, its utility for addressing specific research questions or hypotheses and how to interpret the results. We aimed to develop a framework for the design, documentation and reporting of data curation methods in order to advance the scientific rigour, reproducibility and analysis of the data.Methods:Forty-six experts participated in a modified Delphi process to reach consensus on indicators of data curation that could be used in the design and reporting of studies.Results:We identified 46 indicators that are applicable to the design, training/testing, run time and post-collection phases of studies.Conclusion:The Data Acquisition, Quality and Curation for Observational Research Designs (DAQCORD) Guidelines are the first comprehensive set of data quality indicators for large observational studies. They were developed around the needs of neuroscience projects, but we believe they are relevant and generalisable, in whole or in part, to other fields of health research, and also to smaller observational studies and preclinical research. The DAQCORD Guidelines provide a framework for achieving high-quality data; a cornerstone of health research.


Author(s):  
Eberhard O. Voit

The new methods of —omics biology, combined with more traditional experiments, have the capacity of generating more high-quality data than ever before. So, why isn’t that sufficient? What is missing? The missing aspects arise from subtle, but important differences between data, information, knowledge, and understanding. ‘Computational systems biology’ explains how laboratory experiments generate data, whereas understanding additionally requires significant human intelligence and knowledge. Computational systems biology (CSB) attempts to bridge the gap between data and understanding. It uses a pipeline from data to understanding that consists of two toolsets: machine learning and mathematical models. The most useful of these models in CSB fall into two categories: static networks and dynamic biological systems.


2018 ◽  
Vol 8 (6) ◽  
pp. 527-536 ◽  
Author(s):  
Aravind Ganesh ◽  
John H. Wong ◽  
Bijoy K. Menon

Patients presenting with acutely symptomatic carotid stenosis (a “hot carotid”) are known to be at a high up-front risk of recurrent strokes. Uncertainties remain regarding the appropriate management of such patients in the acute period, particularly with respect to anti-thrombotic treatment as they await revascularisation with carotid endarterectomy (CEA) or angioplasty/stenting (CAS). Decision-making is further complicated when intraluminal thrombi are encountered on vessel imaging. Given these uncertainties, and the paucity of high-quality data in the literature, we sought expert opinion from around the globe on how to manage patients with a “hot carotid” as they await CEA/CAS, with a focus on anti-thrombotic treatment options. Similar questions were posed to the rest of our readership in an online survey, the results of which are also presented.


2020 ◽  
Vol 23 ◽  
pp. 145-160
Author(s):  
Darío Tilves Santiago ◽  
Carmén García Mateo ◽  
Soledad Torres Guijarro ◽  
Laura Docío Fernández ◽  
José Luis Alba Castro

Automatic sign language recognition (ASLR) is quite a complex task, not only for the difficulty of dealing with very dynamic video information, but also because almost every sign language (SL) can be considered as an under-resourced language when it comes to language technology. Spanish sign language (LSE) is one of those under-resourced languages. Developing technology for SSL implies a number of technical challenges that must be tackled down in a structured and sequential manner. In this paper, some problems of machine-learning- based ASLR are addressed. A review of publicly available datasets is given and a new one is presented. It is also discussed the current annotations methods and annotation programs. In our review of existing datasets, our main conclusion is that there is a need for more with high-quality data and annotations.


2018 ◽  
Vol 47 (2) ◽  
pp. 124-130 ◽  
Author(s):  
Andras Nagy ◽  
Ingo Jahn

Today renewable energies, particularly wind energy are important to meet 24 hour energy demands while keeping car­bon emissions low. As the cost of renewable energies are high, improving their efficiency is a key factor to reduce energy prices. High quality experimental data is essential to develop deeper understanding of the existing systems and to improve their efficiency.This paper introduces an unmanned airborne data acquisition system that can measure properties around wind-turbines to pro­vide new insight into aerodynamic performance and loss mech­anisms and to provide validation data for wind-turbine design methods. The described system is a flexible and portable platform for collecting high quality data from existing full-scale wind-tur­bine installations. This allows experiments to be conducted with­out scaling and with real-world boundary conditions.The system consists of two major parts: the unmanned flying platform (UAV) and the data acquisition system (DAQ). For the UAV a commercially available unit is selected, which has the ability to fly a route autonomously with sufficient preci­sion, the ability to hover, and sufficient load capacity to carry the DAQ system. The DAQ system in contrast, is developed in-house to achieve a high quality data collection capability and to increase flexibility.


2020 ◽  
Vol 75 (1) ◽  
pp. 7-12 ◽  
Author(s):  
F. Prior ◽  
J. Almeida ◽  
P. Kathiravelu ◽  
T. Kurc ◽  
K. Smith ◽  
...  

Author(s):  
Mary Kay Gugerty ◽  
Dean Karlan

Without high-quality data, even the best-designed monitoring and evaluation systems will collapse. Chapter 7 introduces some the basics of collecting high-quality data and discusses how to address challenges that frequently arise. High-quality data must be clearly defined and have an indicator that validly and reliably measures the intended concept. The chapter then explains how to avoid common biases and measurement errors like anchoring, social desirability bias, the experimenter demand effect, unclear wording, long recall periods, and translation context. It then guides organizations on how to find indicators, test data collection instruments, manage surveys, and train staff appropriately for data collection and entry.


Sign in / Sign up

Export Citation Format

Share Document