Chemically Augmented String Kernel for Extraction and Classification of Chemical Compounds from Text

Author(s):  
Venkata Joopudi ◽  
Akansha Singh ◽  
Keerthana Kumar ◽  
Anirudh Murali ◽  
Priya Gandhi ◽  
...  
2016 ◽  
Vol 9 (3) ◽  
Author(s):  
Ali Muhammed Soomro ◽  
Ahmed Farooq Ansari ◽  
Ghulam Mustafa Seehar

Pesticides being potentially toxic agents and harmful to human and eco-system are deliberately added to our environment for protecting the crops towards better yields- The studies of pesticide toxicity have been ignored in the farmers of Sindh as yet. In this regard the initiative was taken for the epidemiological study. A questionnaire was designed to conduct the interviews of the farmers directly exposed during July -August season. They were supposed to reply about the nature of pesticides, doses sprayed, exposure timings along with duration of spray and health complaints during, after spray and types of toxicity symptoms. 214 male farmers from 11 to 70 years of age who have been exposed to 21 different pesticides of six groups of agro-chemical compounds were called upon to answer. Chlorinated hydrocarbon compound Endosulfan as product and Orqanoohos Dhorous as group containing pesticides. was noted the most frequently used- The age groups 11-20 and 41 - 50 years have been reported as most affected. Two-third of the surveyed population complained for 20 various ailments. Among those vertigo followed by headache. unconsciousness, body allergies, etc as the major symptoms. The highest percent (94.4%) of these were assessed at district Ghotki whereas the lowest affected percent (53.3%) was recorded from district Shikarpur. As a whole 65.2% farmers were noted as affected. This epidemiological study discloses the pesticide toxicity more intensive among the farmers & at higher scale than the WHO recommended classification of pesticides by hazard.


2016 ◽  
Vol 14 (06) ◽  
pp. 1650033 ◽  
Author(s):  
Li Gu ◽  
Lichun Xue ◽  
Qi Song ◽  
Fengji Wang ◽  
Huaqin He ◽  
...  

During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90.95% and 0.80, respectively. SVM algorithm combined with 19 chemical compounds selected by R-PSO achieved the best assessment performance of the aromatic quality of tobacco leaves, where the PCC and MSE were 0.594 and 0.263, respectively. Finally, we constructed two online tools to classify the fragrant style and evaluate the aromatic quality of flue-cured tobacco leaf samples. These tools can be accessed at http://bioinformatics.fafu.edu.cn/tobacco .


Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3349 ◽  
Author(s):  
Maciej Roman Nowak ◽  
Rafał Zdunek ◽  
Edward Pliński ◽  
Piotr Świątek ◽  
Małgorzata Strzelecka ◽  
...  

In this study, we presented the concept and implementation of a fully functional system for the recognition of bi-heterocyclic compounds. We have conducted research into the application of machine learning methods to correctly recognize compounds based on THz spectra, and we have described the process of selecting optimal parameters for the kernel support vector machine (KSVM) with an additional `unknown’ class. The chemical compounds used in the study contain a target molecule, used in pharmacy to combat inflammatory states formed in living organisms. Ready-made medical products with similar properties are commonly referred to as non-steroidal anti-inflammatory drugs (NSAIDs) once authorised on the pharmaceutical market. It was crucial to clearly determine whether the tested sample is a chemical compound known to researchers or is a completely new structure which should be additionally tested using other spectrometric methods. Our approach allows us to achieve 100% accuracy of the classification of the tested chemical compounds in the time of several milliseconds counted for 30 samples of the test set. It fits perfectly into the concept of rapid recognition of bi-heterocyclic compounds without the need to analyse the percentage composition of compound components, assuming that the sample is classified in a known group. The method allows us to minimize testing costs and significant reduction of the time of analysis.


2006 ◽  
Vol 49 (2) ◽  
pp. 523-533 ◽  
Author(s):  
Yoshifumi Fukunishi ◽  
Yoshiaki Mikami ◽  
Kei Takedomi ◽  
Masaya Yamanouchi ◽  
Hideaki Shima ◽  
...  
Keyword(s):  

2004 ◽  
Vol 5 (2) ◽  
pp. 156-162 ◽  
Author(s):  
Ulrike Wittig ◽  
Andreas Weidemann ◽  
Renate Kania ◽  
Christian Peiss ◽  
Isabel Rojas

Data quality in biological databases has become a topic of great discussion. To provide high quality data and to deal with the vast amount of biochemical data, annotators and curators need to be supported by software that carries out part of their work in an (semi-) automatic manner. The detection of errors and inconsistencies is a part that requires the knowledge of domain experts, thus in most cases it is done manually, making it very expensive and time-consuming. This paper presents two tools to partially support the curation of data on biochemical pathways. The tool enables the automatic classification of chemical compounds based on their respective SMILES strings. Such classification allows the querying and visualization of biochemical reactions at different levels of abstraction, according to the level of detail at which the reaction participants are described. Chemical compounds can be classified in a flexible manner based on different criteria. The support of the process of data curation is provided by facilitating the detection of compounds that are identified as different but that are actually the same. This is also used to identify similar reactions and, in turn, pathways.


Sign in / Sign up

Export Citation Format

Share Document