statistical natural language processing Recently Published Documents

Unique Approach ◽

Processing Techniques

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence.

Classification and generation of grammatical errors.

10.32920/ryerson.14647515.v1 ◽

2021 ◽

Author(s):

Anthony Penniston

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

Grammatical Structure ◽

Machine Learning Classification ◽

Grammatical Errors ◽

Written Form ◽

Unique Approach ◽

Processing Techniques

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence.

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods

JMIR Medical Informatics ◽

10.2196/13039 ◽

2019 ◽

Vol 7 (1) ◽

pp. e13039 ◽

Cited By ~ 7

Author(s):

Tao Chen ◽

Mark Dredze ◽

Jonathan P Weiner ◽

Leilani Hernandez ◽

Joe Kimura ◽

...

Keyword(s):

Natural Language ◽

Electronic Health Record ◽

Language Processing ◽

Health Record ◽

Geriatric Syndromes ◽

Processing Methods ◽

Clinical Notes ◽

Electronic Health

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods (Preprint)

10.2196/preprints.13039 ◽

2018 ◽

Author(s):

Tao Chen ◽

Mark Dredze ◽

Jonathan P Weiner ◽

Leilani Hernandez ◽

Joe Kimura ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Geriatric Syndrome ◽

Free Text ◽

Geriatric Syndromes ◽

Health Maintenance ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. OBJECTIVE We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. METHODS We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. RESULTS A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. CONCLUSIONS This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.

Textual Data Mining For Knowledge Discovery and Data Classification: A Comparative Study

European Scientific Journal ESJ ◽

10.19044/esj.2017.v13n21p429 ◽

2017 ◽

Vol 13 (21) ◽

pp. 429

Author(s):

Nadeem Ur-Rahman

Keyword(s):

Data Mining ◽

Text Mining ◽

Language Processing ◽

Business Processes ◽

Decision Makers ◽

Text Documents ◽

Textual Data ◽

Mining Methods ◽

Many Sources

Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.

Statistical Natural Language Processing for Sentiment Analysis

Undergraduate Topics in Computer Science - Introduction to Data Science ◽

10.1007/978-3-319-50017-1_10 ◽

2017 ◽

pp. 181-197

Author(s):

Laura Igual ◽

Santi Seguí

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Statistical Natural Language Processing

Encyclopedia of Machine Learning and Data Mining ◽

10.1007/978-1-4899-7687-1_100446 ◽

2017 ◽

pp. 1177-1177

Keyword(s):

Natural Language ◽

Language Processing ◽

ISQUA16-1878ARE STATISTICAL NATURAL LANGUAGE PROCESSING MODELS FOR PNEUMONIA SURVEILLANCE GENERALIZABLE ACROSS ACUTE CARE HOSPITALS?

International Journal for Quality in Health Care ◽

10.1093/intqhc/mzw104.50 ◽

2016 ◽

Vol 28 (suppl 1) ◽

pp. 33.1-33

Author(s):

C. M. Rochefort ◽

A. D. Verma ◽

D. L. Buckeridge ◽

A. Forster

Keyword(s):

Natural Language ◽

Acute Care ◽

Language Processing ◽

Acute Care Hospitals ◽

Predicting the gender of Welsh nouns

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2015-0001 ◽

2016 ◽

Vol 12 (2) ◽

Cited By ~ 1

Author(s):

Michael Hammond

Keyword(s):

Natural Language ◽

Language Processing ◽

Grammatical Gender ◽

Statistical Properties ◽

AbstractWelsh grammatical gender exhibits several unusual properties. This paper argues that these properties are necessarily connected. The argument is based on a series of corpus investigations using techniques from statistical natural language processing, specifically distinguishing properties that exhibit significant statistical patterns from those which can be used to make useable predictions. Specifically, it’s shown that the grammatical properties of Welsh gender are such that its unusual statistical properties follow.

4.Machine Learning in Statistical Natural Language Processing

The Journal of The Institute of Image Information and Television Engineers ◽

10.3169/itej.69.131 ◽

2015 ◽

Vol 69 (2) ◽

pp. 131-135

Author(s):

Daichi Mochihashi

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽