scholarly journals Predicting and interpreting oxide glass properties by machine learning using large datasets

Author(s):  
Daniel R. Cassar ◽  
Saulo Martiello Mastelini ◽  
Tiago Botari ◽  
Edesio Alcobaça ◽  
André C.P. L.F. de Carvalho ◽  
...  
2020 ◽  
Author(s):  
Pedro Ballester

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.


2021 ◽  
pp. 117432
Author(s):  
Collin J. Wilkinson ◽  
Cory Trivelpiece ◽  
Rob Hust ◽  
Rebecca S. Welch ◽  
Steve A. Feller ◽  
...  

Author(s):  
Nicola Voyle ◽  
Maximilian Kerz ◽  
Steven Kiddle ◽  
Richard Dobson

This chapter highlights the methodologies which are increasingly being applied to large datasets or ‘big data’, with an emphasis on bio-informatics. The first stage of any analysis is to collect data from a well-designed study. The chapter begins by looking at the raw data that arises from epidemiological studies and highlighting the first stages in creating clean data that can be used to draw informative conclusions through analysis. The remainder of the chapter covers data formats, data exploration, data cleaning, missing data (i.e. the lack of data for a variable in an observation), reproducibility, classification versus regression, feature identification and selection, method selection (e.g. supervised versus unsupervised machine learning), training a classifier, and drawing conclusions from modelling.


2017 ◽  
Vol 27 (09n10) ◽  
pp. 1579-1589 ◽  
Author(s):  
Reinier Morejón ◽  
Marx Viana ◽  
Carlos Lucena

Data mining is a hot topic that attracts researchers of different areas, such as database, machine learning, and agent-oriented software engineering. As a consequence of the growth of data volume, there is an increasing need to obtain knowledge from these large datasets that are very difficult to handle and process with traditional methods. Software agents can play a significant role performing data mining processes in ways that are more efficient. For instance, they can work to perform selection, extraction, preprocessing, and integration of data as well as parallel, distributed, or multisource mining. This paper proposes a framework based on multiagent systems to apply data mining techniques to health datasets. Last but not least, the usage scenarios that we use are datasets for hypothyroidism and diabetes and we run two different mining processes in parallel in each database.


2019 ◽  
Vol 46 (6) ◽  
pp. 810-822
Author(s):  
Eleonore Fournier-Tombs ◽  
Giovanna Di Marzo Serugendo

This article proposes an automated methodology for the analysis of online political discourse. Drawing from the discourse quality index (DQI) by Steenbergen et al., it applies a machine learning–based quantitative approach to measuring the discourse quality of political discussions online. The DelibAnalysis framework aims to provide an accessible, replicable methodology for the measurement of discourse quality that is both platform and language agnostic. The framework uses a simplified version of the DQI to train a classifier, which can then be used to predict the discourse quality of any non-coded comment in a given political discussion online. The objective of this research is to provide a systematic framework for the automated discourse quality analysis of large datasets and, in applying this framework, to yield insight into the structure and features of political discussions online.


2021 ◽  
Author(s):  
Trang Bui

Travel and tourism have become a worldwide trend, and the market for this industry continues to grow extensively. Although most people enjoy traveling, planning trips can take hours, weeks or even months to ensure the best place, the best itinerary, and the best price is found. With Artificial Intelligence (AI) and Machine Learning (ML), large datasets can be analyzed and an AI-infused travel system can be utilized to generate highly personalized suggestions. The main purpose of the project is to create a travel planning application to provide an efficient and highly personalized experience for users. The app will help users plan their trips, choose activities, restaurants, mode of transportation and destinations that fit their preferences, budgets, and schedules in minutes without having to spend hours researching on the Internet or downloading multiple travel apps.


2020 ◽  
Author(s):  
Pedro Ballester

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.


Author(s):  
Kirti Magudia ◽  
Christopher P. Bridge ◽  
Katherine P. Andriole ◽  
Michael H. Rosenthal

AbstractWith vast interest in machine learning applications, more investigators are proposing to assemble large datasets for machine learning applications. We aim to delineate multiple possible roadblocks to exam retrieval that may present themselves and lead to significant time delays. This HIPAA-compliant, institutional review board–approved, retrospective clinical study required identification and retrieval of all outpatient and emergency patients undergoing abdominal and pelvic computed tomography (CT) at three affiliated hospitals in the year 2012. If a patient had multiple abdominal CT exams, the first exam was selected for retrieval (n=23,186). Our experience in attempting to retrieve 23,186 abdominal CT exams yielded 22,852 valid CT abdomen/pelvis exams and identified four major categories of challenges when retrieving large datasets: cohort selection and processing, retrieving DICOM exam files from PACS, data storage, and non-recoverable failures. The retrieval took 3 months of project time and at minimum 300 person-hours of time between the primary investigator (a radiologist), a data scientist, and a software engineer. Exam selection and retrieval may take significantly longer than planned. We share our experience so that other investigators can anticipate and plan for these challenges. We also hope to help institutions better understand the demands that may be placed on their infrastructure by large-scale medical imaging machine learning projects.


Sign in / Sign up

Export Citation Format

Share Document