Lessons Learned from High Throughput Screening of >250,000 Cells: New Cell Evaluation Methods and Data Mining Techniques

2019 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>


Author(s):  
Shadi Aljawarneh ◽  
Aurea Anguera ◽  
John William Atwood ◽  
Juan A. Lara ◽  
David Lizcano

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.


2004 ◽  
Vol 47 (25) ◽  
pp. 6373-6383 ◽  
Author(s):  
David J. Diller ◽  
Doug W. Hobbs

2014 ◽  
Vol 19 (5) ◽  
pp. 707-714 ◽  
Author(s):  
Christine Clougherty Genick ◽  
Danielle Barlier ◽  
Dominique Monna ◽  
Reto Brunner ◽  
Céline Bé ◽  
...  

For approximately a decade, biophysical methods have been used to validate positive hits selected from high-throughput screening (HTS) campaigns with the goal to verify binding interactions using label-free assays. By applying label-free readouts, screen artifacts created by compound interference and fluorescence are discovered, enabling further characterization of the hits for their target specificity and selectivity. The use of several biophysical methods to extract this type of high-content information is required to prevent the promotion of false positives to the next level of hit validation and to select the best candidates for further chemical optimization. The typical technologies applied in this arena include dynamic light scattering, turbidometry, resonance waveguide, surface plasmon resonance, differential scanning fluorimetry, mass spectrometry, and others. Each technology can provide different types of information to enable the characterization of the binding interaction. Thus, these technologies can be incorporated in a hit-validation strategy not only according to the profile of chemical matter that is desired by the medicinal chemists, but also in a manner that is in agreement with the target protein’s amenability to the screening format. Here, we present the results of screening strategies using biophysics with the objective to evaluate the approaches, discuss the advantages and challenges, and summarize the benefits in reference to lead discovery. In summary, the biophysics screens presented here demonstrated various hit rates from a list of ~2000 preselected, IC50-validated hits from HTS (an IC50 is the inhibitor concentration at which 50% inhibition of activity is observed). There are several lessons learned from these biophysical screens, which will be discussed in this article.


2019 ◽  
Vol 123 (23) ◽  
pp. 14610-14618 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

Author(s):  
Hoda Ahmed Abdelhafez

Mining big data is getting a lot of attention currently because the businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety and velocity, data mining techniques and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.


Author(s):  
Hoda Ahmed Abdelhafez

Mining big data is getting a lot of attention currently because businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety, and velocity, data mining techniques, and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.


2002 ◽  
Vol 7 (4) ◽  
pp. 341-351 ◽  
Author(s):  
Michael F.M. Engels ◽  
Luc Wouters ◽  
Rudi Verbeeck ◽  
Greet Vanhoof

A data mining procedure for the rapid scoring of high-throughput screening (HTS) compounds is presented. The method is particularly useful for monitoring the quality of HTS data and tracking outliers in automated pharmaceutical or agrochemical screening, thus providing more complete and thorough structure-activity relationship (SAR) information. The method is based on the utilization of the assumed relationship between the structure of the screened compounds and the biological activity on a given screen expressed on a binary scale. By means of a data mining method, a SAR description of the data is developed that assigns probabilities of being a hit to each compound of the screen. Then, an inconsistency score expressing the degree of deviation between the adequacy of the SAR description and the actual biological activity is computed. The inconsistency score enables the identification of potential outliers that can be primed for validation experiments. The approach is particularly useful for detecting false-negative outliers and for identifying SAR-compliant hit/nonhit borderline compounds, both of which are classes of compounds that can contribute substantially to the development and understanding of robust SARs. In a first implementation of the method, one- and two-dimensional descriptors are used for encoding molecular structure information and logistic regression for calculating hits/nonhits probability scores. The approach was validated on three data sets, the first one from a publicly available screening data set and the second and third from in-house HTS screening campaigns. Because of its simplicity, robustness, and accuracy, the procedure is suitable for automation.


2005 ◽  
Vol 10 (7) ◽  
pp. 653-657 ◽  
Author(s):  
Nadine H. Elowe ◽  
Jan E. Blanchard ◽  
Jonathan D. Cechetto ◽  
Eric D. Brown

High-throughput screening (HTS) generates an abundance of data that are a valuable resource to be mined. Dockers and data miners can use “real-world” HTS data to test and further develop their tools. A screen of 50,000 diverse small molecules was carried out against Escherichia coli dihydrofolate reductase (DHFR) and compared with a previous screen of 50,000 compounds against the same target. Identical assays and conditions were maintained for both studies. Prior to the completion of the second screen, the original screening data were publicly released for use as a “training set,” and computational chemists and data analysts were challenged to predict the activity of compounds in this second “test set.” Upon completion, the primary screen of the test set generated no potent inhibitors of DHFR activity.


Author(s):  
Kamal Azzaoui ◽  
John P. Priestle ◽  
Thibault Varin ◽  
Ansgar Schuffenhauer ◽  
Jeremy L. Jenkins ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document