scholarly journals Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and PDF Documents: Improving Data Access and Visualization for Veterinarians

2021 ◽  
Vol 8 ◽  
Author(s):  
Majid Jaberi-Douraki ◽  
Soudabeh Taghian Dinani ◽  
Nuwan Indika Millagaha Gedara ◽  
Xuan Xu ◽  
Emily Richards ◽  
...  

Extra-label drug use in food animal medicine is authorized by the US Animal Medicinal Drug Use Clarification Act (AMDUCA), and estimated withdrawal intervals are based on published scientific pharmacokinetic data. Occasionally there is a paucity of scientific data on which to base a withdrawal interval or a large number of animals being treated, driving the need to test for drug residues. Rapid assay commercial farm-side tests are essential for monitoring drug residues in animal products to protect human health. Active ingredients, sensitivity, matrices, and species that have been evaluated for commercial rapid assay tests are typically reported on manufacturers' websites or in PDF documents that are available to consumers but may require a special access request. Additionally, this information is not always correlated with FDA-approved tolerances. Furthermore, parameter changes for these tests can be very challenging to regularly identify, especially those listed on websites or in documents that are not publicly available. Therefore, artificial intelligence plays a critical role in efficiently extracting the data and ensure current information. Extracting tables from PDF and HTML documents has been investigated both by academia and commercial tool builders. Research in text mining of such documents has become a widespread yet challenging arena in implementing natural language programming. However, techniques of extracting tables are still in their infancy and being investigated and improved by researchers. In this study, we developed and evaluated a data-mining method for automatically extracting rapid assay data from electronic documents. Our automatic electronic data extraction method includes a software package module, a developed pattern recognition tool, and a data mining engine. Assay details were provided by several commercial entities that produce these rapid drug residue assay tests. During this study, we developed a real-time conversion system and method for reflowing contents in these files for accessibility practice and research data mining. Embedded information was extracted using an AI technology for text extraction and text mining to convert to structured formats. These data were then made available to veterinarians and producers via an online interface, allowing interactive searching and also presenting the commercial test assay parameters in reference to FDA-approved tolerances.

2016 ◽  
Vol 54 ◽  
pp. 456-468 ◽  
Author(s):  
Changjun Hu ◽  
Yang Li ◽  
Xin Cheng ◽  
Zhenyu Liu

2017 ◽  
Author(s):  
Joon-Yong Lee ◽  
Grant M. Fujimoto ◽  
Ryan Wilson ◽  
H. Steven Wiley ◽  
Samuel H. Payne

AbstractIdentifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is that the vast majority of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Wencheng Sun ◽  
Zhiping Cai ◽  
Yangyang Li ◽  
Fang Liu ◽  
Shengqun Fang ◽  
...  

Currently, medical institutes generally use EMR to record patient’s condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work.


2020 ◽  
Author(s):  
Donatello Elia ◽  
Fabrizio Antonio ◽  
Cosimo Palazzo ◽  
Paola Nassisi ◽  
Sofiane Bendoukha ◽  
...  

<p>Scientific data analysis experiments and applications require software capable of handling domain-specific and data-intensive workflows. The increasing volume of scientific data is further exacerbating these data management and analytics challenges, pushing the community towards the definition of novel programming environments for dealing efficiently with complex experiments, while abstracting from the underlying computing infrastructure. </p><p>ECASLab provides a user-friendly data analytics environment to support scientists in their daily research activities, in particular in the climate change domain, by integrating analysis tools with scientific datasets (e.g., from the ESGF data archive) and computing resources (i.e., Cloud and HPC-based). It combines the features of the ENES Climate Analytics Service (ECAS) and the JupyterHub service, with a wide set of scientific libraries from the Python landscape for data manipulation, analysis and visualization. ECASLab is being set up in the frame of the European Open Science Cloud (EOSC) platform - in the EU H2020 EOSC-Hub project - by CMCC (https://ecaslab.cmcc.it/) and DKRZ (https://ecaslab.dkrz.de/), which host two major instances of the environment. </p><p>ECAS, which lies at the heart of ECASLab, enables scientists to perform data analysis experiments on large volumes of multi-dimensional data by providing a workflow-oriented, PID-supported, server-side and distributed computing approach. ECAS consists of multiple components, centered around the Ophidia High Performance Data Analytics framework, which has been integrated with data access and sharing services (e.g., EUDAT B2DROP/B2SHARE, Onedata), along with the EGI federated cloud infrastructure. The integration with JupyterHub provides a convenient interface for scientists to access the ECAS features for the development and execution of experiments, as well as for sharing results (and the experiment/workflow definition itself). ECAS parallel data analytics capabilities can be easily exploited in Jupyter Notebooks (by means of PyOphidia, the Ophidia Python bindings) together with well-known Python modules for processing and for plotting the results on charts and maps (e.g., Dask, Xarray, NumPy, Matplotlib, etc.). ECAS is also one of the compute services made available to climate scientists by the EU H2020 IS-ENES3 project. </p><p>Hence, this integrated environment represents a complete software stack for the design and run of interactive experiments as well as complex and data-intensive workflows. One class of such large-scale workflows, efficiently implemented through the environment resources, refers to multi-model data analysis in the context of both CMIP5 and CMIP6 (i.e., precipitation trend analysis orchestrated in parallel over multiple CMIP-based datasets).</p>


Pflege ◽  
2019 ◽  
Vol 32 (1) ◽  
pp. 57-63
Author(s):  
Hannes Mayerl ◽  
Tanja Trummer ◽  
Erwin Stolz ◽  
Éva Rásky ◽  
Wolfgang Freidl

Abstract. Background: Given that nursing staff play a critical role in the decision regarding use of physical restraints, research has examined nursing professionals’ attitudes toward this practice. Aim: Since nursing professionals’ views on physical restraint use have not yet been examined in Austria to date, we aimed to explore nursing professionals’ attitudes concerning use of physical restraints in nursing homes of Styria (Austria). Method: Data were collected from a convenience sample of nursing professionals (N = 355) within 19 Styrian nursing homes, based on a cross-sectional study design. Attitudes toward the practice of restraint use were assessed by means of the Maastricht Attitude Questionnaire in the German version. Results: The overall results showed rather positive attitudes toward the use of physical restraints, yet the findings regarding the sub-dimensions of the questionnaire were mixed. Although nursing professionals tended to deny “good reasons” for using physical restraints, they evaluated the consequences of physical restraint use rather positive and considered restraint use as an appropriate health care practice. Nursing professionals’ views regarding the consequences of using specific physical restraints further showed that belts were considered as the most restricting and discomforting devices. Conclusions: Overall, Austrian nursing professionals seemed to hold more positive attitudes toward the use of physical restraints than counterparts in other Western European countries. Future nationwide large-scale surveys will be needed to confirm our findings.


2019 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>


Sign in / Sign up

Export Citation Format

Share Document