natural language processing tool
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 27)

H-INDEX

9
(FIVE YEARS 1)

2022 ◽  
pp. 825-841
Author(s):  
Segun Aina ◽  
Samuel Dayo Okegbile ◽  
Adeniran Ishola Oluwaranti ◽  
Oghenerukome Brenda Okoro ◽  
Tayo Obasanya

The work reported in this article developed a home automated system using voice activation. This is with a view to providing users complete control over electrical appliances using simple easy to remember voice commands on an Android mobile device. This work was implemented using the Atmega 328 microcontroller, Relays and a Wi-Fi shield. The human voice is first converted to text using a Natural language processing tool from the Android based application. Thereafter, the text is sent over the internet via the PubNub to the microcontroller. The Atmega 328 microcontroller was programmed on an Arduino using C programming language and the Android based application was developed using Android Software Development Kit. Results obtained from the testing show that the implemented system achieves the mean scores of 8, 7.6, and 7.2 for ease of use, learnability and effectiveness respectively justifying the fact that the system is capable of controlling appliances by changing their state (ON/OFF) from remote a location with a response time within the reasonable limit.


2021 ◽  
Author(s):  
Xinxu Shen ◽  
Troy Houser ◽  
David Victor Smith ◽  
Vishnu P. Murty

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.


BMJ Open ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. e049249
Author(s):  
Lasantha Jayasinghe ◽  
Sumithra Velupillai ◽  
Robert Stewart

ObjectiveTo investigate the distribution and content of quoted text within the electronic health records (EHRs) using a previously developed natural language processing tool to generate a database of quotations.Designχ2 and logistic regression were used to assess the profile of patients receiving mental healthcare for whom quotations exist. K-means clustering using pre-trained word embeddings developed on general discharge summaries and psychosis specific mental health records were used to group one-word quotations into semantically similar groups and labelled by human subjective judgement.SettingEHRs from a large mental healthcare provider serving a geographic catchment area of 1.3 million residents in South London.ParticipantsFor analysis of distribution, 33 499 individuals receiving mental healthcare on 30 June 2019 in South London and Maudsley. For analysis of content, 1587 unique lemmatised words, appearing a minimum of 20 times on the database of quotations created on 16 January 2020.ResultsThe strongest individual indicator of quoted text is inpatient care in the preceding 12 months (OR 9.79, 95% CI 7.84 to 12.23). Next highest indicator is ethnicity with those with a black background more likely to have quoted text in comparison to white background (OR 2.20, 95% CI 2.08 to 2.33). Both are attenuated slightly in the adjusted model. Early psychosis intervention word embeddings subjectively produced categories pertaining to: mental illness, verbs, negative sentiment, people/relationships, mixed sentiment, aggression/violence and negative connotation.ConclusionsThe findings that inpatients and those from a black ethnic background more commonly have quoted text raise important questions around where clinical attention is focused and whether this may point to any systematic bias. Our study also shows that word embeddings trained on early psychosis intervention records are useful in categorising even small subsets of the clinical records represented by one-word quotations.


2021 ◽  
Vol 1 (2) ◽  
pp. 43-51
Author(s):  
Da Qi ◽  
Hua Wang

The present study attempts to explore the distribution patterns of the valency-changing verbs from the perspective of quantitative linguistics. We took authentic spoken language data as the research materials. The corpus used in this paper is a self-built spoken English corpus containing about 21,000 words. We half-manually annotated the corpus with the help of SpaCy, a natural language processing tool. According to the annotation results and statistical data, we obtained a total of 217 valency-changing English verbs and 248 sentence components governed by them. After analysis, the current study came to the following conclusions: First, bivalent verbs are most frequent among the three types of valency-changing verbs; second, after fitting all the language data to different probability distributions, we found that the rank-frequency distributions of all the valency-changing English verbs with different numbers of obligatory arguments obey the power law, and the frequencies of bivalent valency-changing verbs obey other kinds of distributions such as the mixed Poisson distribution.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
C M Maciejewski ◽  
M K Krajsman ◽  
K O Ozieranski ◽  
M B Basza ◽  
M G Gawalko ◽  
...  

Abstract Background An estimate of 80% of data gathered in electronic health records is unstructured, textual information that cannot be utilized for research purposes until it is manually coded into a database. Manual coding is a both cost and time- consuming process. Natural language processing (NLP) techniques may be utilized for extraction of structured data from text. However, little is known about the accuracy of data obtained through these methods. Purpose To evaluate the possibility of employing NLP techniques in order to obtain data regarding risk factors needed for CHA2DS2VASc scale calculation and detection of antithrombotic medication prescribed in the population of atrial fibrillation (AF) patients of a cardiology ward. Methods An automatic tool for diseases and drugs recognition based on regular expressions rules was designed through cooperation of physicians and IT specialists. Records of 194 AF patients discharged from a cardiology ward were manually reviewed by a physician- annotator as a comparator for the automatic approach. Results Median CHA2DS2VASc score calculated by the automatic was 3 (IQR 2–4) versus 3 points (IQR 2–4) for the manual method (p=0.66). High agreement between CHA2DS2VASc scores calculated by both methods was present (Kendall's W=0.979; p<0.001). In terms of anticoagulant recognition, the automatic tool misqualified the drug prescribed in 4 cases. Conclusion NLP-based techniques are a promising tools for obtaining structured data for research purposes from electronic health records in polish. Tight cooperation of physicians and IT specialists is crucial for establishing accurate recognition patterns. Funding Acknowledgement Type of funding sources: None.


Author(s):  
Josephine Lukito ◽  
Prathusha Sarma ◽  
Jordan Foley ◽  
Aman Abhishek ◽  
Erik Bucy ◽  
...  

Live-tweeting has emerged as a popular hybrid media activity during broadcasted media events. Through second screens, users are able to engage with one another and react in real time to the broadcasted content. These reactions are dynamic: they ebb and flow throughout the media event as users respond to and converse about different memorable moments. Using the first 2016 U.S. presidential debate between Hillary Clinton and Donald Trump as a case, this paper employs a temporal method for identifying resonant moments on social media during televised events by combining time series analysis, qualitative (human-in-the-loop) evaluation, and a novel natural language processing tool to identify discursive shifts before and after resonant moments. This analysis finds key differences in social media discourse about the two candidates. Notably, Trump received substantially more coverage than Clinton throughout the debate. However, a more in-depth analysis of these candidates’ resonant moments reveals that discourse about Trump tended to be more critical compared to discourse associated with Clinton’s resonant moments.


2021 ◽  
Vol 7 ◽  
pp. e408 ◽  
Author(s):  
Ching-Ru Ko ◽  
Hsien-Tsung Chang

Investing in stocks is an important tool for modern people’s financial management, and how to forecast stock prices has become an important issue. In recent years, deep learning methods have successfully solved many forecast problems. In this paper, we utilized multiple factors for the stock price forecast. The news articles and PTT forum discussions are taken as the fundamental analysis, and the stock historical transaction information is treated as technical analysis. The state-of-the-art natural language processing tool BERT are used to recognize the sentiments of text, and the long short term memory neural network (LSTM), which is good at analyzing time series data, is applied to forecast the stock price with stock historical transaction information and text sentiments. According to experimental results using our proposed models, the average root mean square error (RMSE ) has 12.05 accuracy improvement.


GovdeTurk is a tool for stemming, morphological labeling and verb negation for Turkish language. We designed comprehensive finite automata to represent Turkish grammar rules. Based on these automata, GovdeTurk finds the stem of the word by removing the inflectional suffixes in a longest match strategy. Levenshtein Distance is used to correct spelling errors that may occur during suffix removal. Morphological labeling identifies the functionality of a given token. Nine different dictionaries are constructed for each specific word type. These dictionaries are used in the stemming and morphological labeling. Verb negation module is developed for lexicon based sentiment analysis. GovdeTurk is tested on a dataset of one million words. The results are compared with Zemberek and Turkish Snowball Algorithm. While the closest competitor, Zemberek, in the stemming step has an accuracy of 80%, GovdeTurk gives 97.3% of accuracy. Morphological labeling accuracy of GovdeTurk is 93.6%. With outperforming results, our model becomes foremost among its competitors


Sign in / Sign up

Export Citation Format

Share Document