scholarly journals Prediction and Chemical Interpretation of Singlet-Oxygen-Scavenging Activity of Small Molecule Compounds by Using Machine Learning

Antioxidants ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1751
Author(s):  
Taiki Fujimoto ◽  
Hiroaki Gotoh

A chemically explainable machine learning model was constructed with a small dataset to quantitatively predict the singlet-oxygen-scavenging ability. In this model, ensemble learning based on decision trees resulted in high accuracy. For explanatory variables, molecular descriptors by computational chemistry and Morgan fingerprints were used for achieving high accuracy and simple prediction. The singlet-oxygen-scavenging mechanism was explained by the feature importance obtained from machine learning outputs. The results are consistent with conventional chemical knowledge. The use of machine learning and reduction in the number of measurements for screening high-antioxidant-capacity compounds can considerably improve prediction accuracy and efficiency.

2020 ◽  
Vol 2 (1) ◽  
pp. 17-31
Author(s):  
Szde Yu

The present study compared three methods aimed at predicting the writer's gender based on writing features manifested in electronic discourse. The compared methods included qualitative content analysis, statistical analysis, and machine learning. These methods were further combined to create a mixed methods model. The findings showed that the machine learning model combined with qualitative content analysis produced the best prediction accuracy. Including qualitative content analysis was able to improve accuracy rates even when the training set for machine learning was relatively small. Thus, this study presented a concise model that can be fairly reliable in predicting gender based on electronic discourse with high accuracy rates and such accuracy was consistently found when the model was tested by two separate samples.


2007 ◽  
Vol 01 (04) ◽  
pp. 441-457 ◽  
Author(s):  
STEVEN BETHARD ◽  
JAMES H. MARTIN ◽  
SARA KLINGENSTEIN

This research proposes and evaluates a linguistically motivated approach to extracting temporal structure from text. Pairs of events in a verb-clause construction were considered, where the first event is a verb and the second event is the head of a clausal argument to that verb. All pairs of events in the TimeBank that participated in verb-clause constructions were selected and annotated with the labels BEFORE, OVERLAP and AFTER. The resulting corpus of 895 event-event temporal relations was then used to train a machine learning model. Using a combination of event-level features like tense and aspect with syntax-level features like the paths through the syntactic tree, support vector machine (SVM) models were trained which could identify new temporal relations with 89.2% accuracy. High accuracy models like these are a first step towards automatic extraction of temporal structure from text.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Richard Du ◽  
Efstratios D. Tsougenis ◽  
Joshua W. K. Ho ◽  
Joyce K. Y. Chan ◽  
Keith W. H. Chiu ◽  
...  

AbstractTriaging and prioritising patients for RT-PCR test had been essential in the management of COVID-19 in resource-scarce countries. In this study, we applied machine learning (ML) to the task of detection of SARS-CoV-2 infection using basic laboratory markers. We performed the statistical analysis and trained an ML model on a retrospective cohort of 5148 patients from 24 hospitals in Hong Kong to classify COVID-19 and other aetiology of pneumonia. We validated the model on three temporal validation sets from different waves of infection in Hong Kong. For predicting SARS-CoV-2 infection, the ML model achieved high AUCs and specificity but low sensitivity in all three validation sets (AUC: 89.9–95.8%; Sensitivity: 55.5–77.8%; Specificity: 91.5–98.3%). When used in adjunction with radiologist interpretations of chest radiographs, the sensitivity was over 90% while keeping moderate specificity. Our study showed that machine learning model based on readily available laboratory markers could achieve high accuracy in predicting SARS-CoV-2 infection.


2021 ◽  
Author(s):  
Tarik Abdelfattah ◽  
Ehsaan Nasir ◽  
Junjie Yang ◽  
Jamar Bynum ◽  
Alexander Klebanov ◽  
...  

Abstract Unconventional reservoir development is a multidisciplinary challenge due to complicated physical system, including but not limited to complicated flow mechanism, multiple porosity system, heterogeneous subsurface rock and minerals, well interference, and fluid-rock interaction. With enough well data, physics-based models can be supplemented with data driven methods to describe a reservoir system and accurately predict well performance. This study uses a data driven approach to tackle the field development problem in the Eagle Ford Shale. A large amount of data spanning major oil and gas disciplines was collected and interrogated from around 300 wells in the area of interest. The data driven workflow consists of: Descriptive model to regress on existing wells with the selected well features and provide insight on feature importance, Predictive model to forecast well performance, and Subject matter expert driven prescriptive model to optimize future well design for well economics improvement. To evaluate initial well economics, 365 consecutive days of production oil per CAPEX dollar spent (bbl/$) was setup as the objective function. After a careful model selection, Random Forest (RF) shows the best accuracy with the given dataset, and Differential Evolution (DE) was used for optimization. Using recursive feature elimination (RFE), the final master dataset was reduced to 50 parameters to feed into the machine learning model. After hyperparameter tuning, reasonable regression accuracy was achieved by the Random Forest algorithm, where correlation coefficient (R2) for the training and test dataset was 0.83, and mean absolute error percentage (MAEP) was less than 20%. The model also reveals that the well performance is highly dependent on a good combination of variables spanning geology, drilling, completions, production and reservoir. Completion year has one of the highest feature importance, indicating the improvement of operation and design efficiency and the fluctuation of service cost. Moreover, lateral rate of penetration (ROP) was always amongst the top two important parameters most likely because it impacts the drilling cost significantly. With subject matter experts’ (SME) input, optimization using the regression model was performed in an iterative manner with the chosen parameters and using reasonable upper and lower bounds. Compared to the best existing wells in the vicinity, the optimized well design shows a potential improvement on bbl/$ by approximately 38%. This paper introduces an integrated data driven solution to optimize unconventional development strategy. Comparing to conventional analytical and numerical methods, machine learning model is able to handle large multidimensional dataset and provide actionable recommendations with a much faster turnaround. In the course of field development, the model accuracy can be dynamically improved by including more data collected from new wells.


Micromachines ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1084
Author(s):  
Shaobo Luo ◽  
Yi Zhang ◽  
Kim Truc Nguyen ◽  
Shilun Feng ◽  
Yuzhi Shi ◽  
...  

High accuracy measurement of size is essential in physical and biomedical sciences. Various sizing techniques have been widely used in sorting colloidal materials, analyzing bioparticles and monitoring the qualities of food and atmosphere. Most imaging-free methods such as light scattering measure the averaged size of particles and have difficulties in determining non-spherical particles. Imaging acquisition using camera is capable of observing individual nanoparticles in real time, but the accuracy is compromised by the image defocusing and instrumental calibration. In this work, a machine learning-based pipeline is developed to facilitate a high accuracy imaging-based particle sizing. The pipeline consists of an image segmentation module for cell identification and a machine learning model for accurate pixel-to-size conversion. The results manifest a significantly improved accuracy, showing great potential for a wide range of applications in environmental sensing, biomedical diagnostical, and material characterization.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Matjaž Kukar ◽  
Gregor Gunčar ◽  
Tomaž Vovko ◽  
Simon Podnar ◽  
Peter Černelč ◽  
...  

AbstractPhysicians taking care of patients with COVID-19 have described different changes in routine blood parameters. However, these changes hinder them from performing COVID-19 diagnoses. We constructed a machine learning model for COVID-19 diagnosis that was based and cross-validated on the routine blood tests of 5333 patients with various bacterial and viral infections, and 160 COVID-19-positive patients. We selected the operational ROC point at a sensitivity of 81.9% and a specificity of 97.9%. The cross-validated AUC was 0.97. The five most useful routine blood parameters for COVID-19 diagnosis according to the feature importance scoring of the XGBoost algorithm were: MCHC, eosinophil count, albumin, INR, and prothrombin activity percentage. t-SNE visualization showed that the blood parameters of the patients with a severe COVID-19 course are more like the parameters of a bacterial than a viral infection. The reported diagnostic accuracy is at least comparable and probably complementary to RT-PCR and chest CT studies. Patients with fever, cough, myalgia, and other symptoms can now have initial routine blood tests assessed by our diagnostic tool. All patients with a positive COVID-19 prediction would then undergo standard RT-PCR studies to confirm the diagnosis. We believe that our results represent a significant contribution to improvements in COVID-19 diagnosis.


Processes ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 1342
Author(s):  
Amy J. C. Trappey ◽  
Charles V. Trappey ◽  
Chih-Ping Liang ◽  
Hsin-Jung Lin

Researchers must read and understand a large volume of technical papers, including patent documents, to fully grasp the state-of-the-art technological progress in a given domain. Chemical research is particularly challenging with the fast growth of newly registered utility patents (also known as intellectual property or IP) that provide detailed descriptions of the processes used to create a new chemical or a new process to manufacture a known chemical. The researcher must be able to understand the latest patents and literature in order to develop new chemicals and processes that do not infringe on existing claims and processes. This research uses text mining, integrated machine learning, and knowledge visualization techniques to effectively and accurately support the extraction and graphical presentation of chemical processes disclosed in patent documents. The computer framework trains a machine learning model called ALBERT for automatic paragraph text classification. ALBERT separates chemical and non-chemical descriptive paragraphs from a patent for effective chemical term extraction. The ChemDataExtractor is used to classify chemical terms, such as inputs, units, and reactions from the chemical paragraphs. A computer-supported graph-based knowledge representation interface is developed to plot the extracted chemical terms and their chemical process links as a network of nodes with connecting arcs. The computer-supported chemical knowledge visualization approach helps researchers to quickly understand the innovative and unique chemical or processes of any chemical patent of interest.


Sign in / Sign up

Export Citation Format

Share Document