statistical natural language processing
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 3)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Anthony Penniston

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence.


2021 ◽  
Author(s):  
Anthony Penniston

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence.


2018 ◽  
Author(s):  
Tao Chen ◽  
Mark Dredze ◽  
Jonathan P Weiner ◽  
Leilani Hernandez ◽  
Joe Kimura ◽  
...  

BACKGROUND Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. OBJECTIVE We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. METHODS We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. RESULTS A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. CONCLUSIONS This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.


2017 ◽  
Vol 13 (21) ◽  
pp. 429
Author(s):  
Nadeem Ur-Rahman

Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.


Author(s):  
Michael Hammond

AbstractWelsh grammatical gender exhibits several unusual properties. This paper argues that these properties are necessarily connected. The argument is based on a series of corpus investigations using techniques from statistical natural language processing, specifically distinguishing properties that exhibit significant statistical patterns from those which can be used to make useable predictions. Specifically, it’s shown that the grammatical properties of Welsh gender are such that its unusual statistical properties follow.


Sign in / Sign up

Export Citation Format

Share Document