Natural language indexing for pedoinformatics

The multiple schema for the classification of soils rely on differing criteria but the major soil science systems, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources soil classification systems, are primarily based on inferred pedogenesis. Largely these classifications are compiled from individual observations of soil characteristics within soil profiles, and the vast majority of this pedologic information is contained in nonquantitative text descriptions. We present initial text mining analyses of parsed text in the digitally available USDA soil taxonomy documentation and the Soil Survey Geographic database. Previous research has shown that latent information structure can be extracted from scientific literature using Natural Language Processing techniques, and we show that this latent information can be used to expedite query performance by using syntactic elements and part-of-speech tags as indices. Technical vocabulary often poses a text mining challenge due to the rarity of its diction in the broader context. We introduce an extension to the common English vocabulary that allows for nearly-complete indexing of USDA Soil Series Descriptions.

Download Full-text

A Comparative Analysis of a Detailed and Semi-Detailed Soil Mapping for Sustainable Land Management Using Conventional and Currently Applied Methodologies in Greece

Land ◽

10.3390/land9050154 ◽

2020 ◽

Vol 9 (5) ◽

pp. 154 ◽

Cited By ~ 1

Author(s):

Orestis Kairis ◽

Vassiliki Dimitriou ◽

Chrysoula Aratzioglou ◽

Dionisios Gasparatos ◽

Nicholas Yassoglou ◽

...

Keyword(s):

Soil Classification ◽

Botanical Garden ◽

Soil Mapping ◽

Classification Systems ◽

Soil Survey ◽

Sustainable Land Use ◽

Soil Taxonomy ◽

Reference Base ◽

World Reference Base ◽

Soil Surveys

Two soil mapping methodologies at different scales applied in the same area were compared in order to investigate the potential of their combined use to achieve an integrated and more accurate soil description for sustainable land use management. The two methodologies represent the main types of soil mapping systems used and still applied in soil surveys in Greece. Diomedes Botanical Garden (DBG) (Athens, Greece) was used as a study area because past cartographic data of soil survey were available. The older soil survey data were obtained via the conventional methodology extensively used over time since the beginnings of soil mapping in Greece (1977). The second mapping methodology constitutes the current soil mapping system in Greece recently used for compilation of the national soil map. The obtained cartographic and soil data resulting from the application of the two methodologies were analyzed and compared using appropriate geospatial techniques. Even though the two mapping methodologies have been performed at different mapping scales, using partially different mapping symbols and different soil classification systems, the description of the soils based on the cartographic symbols of the two methodologies presented an agreement of 63.7% while the soil classification by the two taxonomic systems namely Soil Taxonomy and World Reference Base for Soil Resources had an average coincidence of 69.5%.

Download Full-text

Does higher education properly prepare graduates for the growing artificial intelligence market? Gaps identification using text mining

Human Systems Management ◽

10.3233/hsm-211179 ◽

2021 ◽

pp. 1-13

Author(s):

Lamiae Benhayoun ◽

Daniel Lang

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Academic Training ◽

Market Requirements ◽

Job Advertisements ◽

The Individual

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.

Download Full-text

Identifying Causality and Contributory Factors of Pipeline incidents by Employing Natural Language Processing and Text Mining Techniques

Process Safety and Environmental Protection ◽

10.1016/j.psep.2021.05.036 ◽

2021 ◽

Author(s):

Guanyang Liu ◽

Mason Boyd ◽

Mengxi Yu ◽

S. Zohra Halim ◽

Noor Quddus

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Contributory Factors

Download Full-text

A Study of the Effects of the COVID-19 Pandemic on the Experience of Back Pain Reported on Twitter® in the United States: A Natural Language Processing Approach

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18094543 ◽

2021 ◽

Vol 18 (9) ◽

pp. 4543

Author(s):

Krzysztof Fiok ◽

Waldemar Karwowski ◽

Edgar Gutierrez ◽

Maham Saeidi ◽

Awad M. Aljuaid ◽

...

Keyword(s):

United States ◽

Natural Language Processing ◽

Back Pain ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Daily Routine ◽

Body Movements ◽

Data Source ◽

Twitter Users

The COVID-19 pandemic has changed our lifestyles, habits, and daily routine. Some of the impacts of COVID-19 have been widely reported already. However, many effects of the COVID-19 pandemic are still to be discovered. The main objective of this study was to assess the changes in the frequency of reported physical back pain complaints reported during the COVID-19 pandemic. In contrast to other published studies, we target the general population using Twitter as a data source. Specifically, we aim to investigate differences in the number of back pain complaints between the pre-pandemic and during the pandemic. A total of 53,234 and 78,559 tweets were analyzed for November 2019 and November 2020, respectively. Because Twitter users do not always complain explicitly when they tweet about the experience of back pain, we have designed an intelligent filter based on natural language processing (NLP) to automatically classify the examined tweets into the back pain complaining class and other tweets. Analysis of filtered tweets indicated an 84% increase in the back pain complaints reported in November 2020 compared to November 2019. These results might indicate significant changes in lifestyle during the COVID-19 pandemic, including restrictions in daily body movements and reduced exposure to routine physical exercise.

Download Full-text

A Natural-Language-Processing-Based Procedure for Generating Distractors for Multiple-Choice Questions

Evaluation & the Health Professions ◽

10.1177/01632787211046981 ◽

2021 ◽

pp. 016327872110469

Author(s):

Peter Baldwin ◽

Janet Mee ◽

Victoria Yaneva ◽

Miguel Paniagua ◽

Jean D’Angelo ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Multiple Choice ◽

Incorrect Response ◽

The United States ◽

Choice Test ◽

Multiple Choice Questions ◽

Response Options ◽

Medical Licensing

One of the most challenging aspects of writing multiple-choice test questions is identifying plausible incorrect response options—i.e., distractors. To help with this task, a procedure is introduced that can mine existing item banks for potential distractors by considering the similarities between a new item’s stem and answer and the stems and response options for items in the bank. This approach uses natural language processing to measure similarity and requires a substantial pool of items for constructing the generating model. The procedure is demonstrated with data from the United States Medical Licensing Examination (USMLE®). For about half the items in the study, at least one of the top three system-produced candidates matched a human-produced distractor exactly; and for about one quarter of the items, two of the top three candidates matched human-produced distractors. A study was conducted in which a sample of system-produced candidates were shown to 10 experienced item writers. Overall, participants thought about 81% of the candidates were on topic and 56% would help human item writers with the task of writing distractors.

Download Full-text

Identifying and intercepting health misinformation on Reddit dermatology forums with artificially intelligent bots using natural language processing (Preprint)

10.2196/preprints.20975 ◽

2021 ◽

Author(s):

Monique B. Sager ◽

Aditya M. Kashyap ◽

Mila Tamminga ◽

Sadhana Ravoori ◽

Christopher Callison-Burch ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Test Accuracy ◽

Limited Data ◽

Test Environment ◽

Data Set ◽

Inappropriate Care ◽

Processing Techniques

BACKGROUND Reddit, the fifth most popular website in the United States, boasts a large and engaged user base on its dermatology forums where users crowdsource free medical opinions. Unfortunately, much of the advice provided is unvalidated and could lead to inappropriate care. Initial testing has shown that artificially intelligent bots can detect misinformation on Reddit forums and may be able to produce responses to posts containing misinformation. OBJECTIVE To analyze the ability of bots to find and respond to health misinformation on Reddit’s dermatology forums in a controlled test environment. METHODS Using natural language processing techniques, we trained bots to target misinformation using relevant keywords and to post pre-fabricated responses. By evaluating different model architectures across a held-out test set, we compared performances. RESULTS Our models yielded data test accuracies ranging from 95%-100%, with a BERT fine-tuned model resulting in the highest level of test accuracy. Bots were then able to post corrective pre-fabricated responses to misinformation. CONCLUSIONS Using a limited data set, bots had near-perfect ability to detect these examples of health misinformation within Reddit dermatology forums. Given that these bots can then post pre-fabricated responses, this technique may allow for interception of misinformation. Providing correct information, even instantly, however, does not mean users will be receptive or find such interventions persuasive. Further work should investigate this strategy’s effectiveness to inform future deployment of bots as a technique in combating health misinformation. CLINICALTRIAL N/A

Download Full-text

Application of Natural Language Processing and Text Mining to Identify Patterns in Construction-Defect Litigation Cases

Journal of Legal Affairs and Dispute Resolution in Engineering and Construction ◽

10.1061/(asce)la.1943-4170.0000308 ◽

2019 ◽

Vol 11 (4) ◽

pp. 04519024 ◽

Cited By ~ 4

Author(s):

Yashovardhan Jallan ◽

Elizabeth Brogan ◽

Baabak Ashuri ◽

Caroline M. Clevenger

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing

Download Full-text

Natural Language Processing for Rapid Response to Emergent Diseases: Case Study of Calcium Channel Blockers and Hypertension in the COVID-19 Pandemic

Journal of Medical Internet Research ◽

10.2196/20773 ◽

2020 ◽

Vol 22 (8) ◽

pp. e20773 ◽

Cited By ~ 1

Author(s):

Antoine Neuraz ◽

Ivan Lerner ◽

William Digan ◽

Nicolas Paris ◽

Rosy Tsopra ◽

...

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Calcium Channel ◽

Language Processing ◽

Calcium Channel Blockers ◽

Structured Data ◽

Channel Blockers ◽

Knowledge Model ◽

Long Term Treatment

Background A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. Objective The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). Methods We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. Results In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. Conclusions In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.

Download Full-text