scholarly journals A Natural Language Processing Tool for Large-Scale Data Extraction from Echocardiography Reports

PLoS ONE ◽  
2016 ◽  
Vol 11 (4) ◽  
pp. e0153749 ◽  
Author(s):  
Chinmoy Nath ◽  
Mazen S. Albaghdadi ◽  
Siddhartha R. Jonnalagadda
2020 ◽  
Author(s):  
Esra Kahya Özyirmidokuz ◽  
Kumru Uyar ◽  
Raian Ali ◽  
Eduard Alexandru Stoica ◽  
Betül Karakaş

BACKGROUND Measuring online Turkish happiness requires a Turkish happiness dictionary which could reflect norms and social values more culturally and linguistically instead of using a translation-oriented method. Analyzing data without neglecting cultural characteristics will not be reliable. Turkish translation of an English word in the Affective Norms of English Words (ANEW) dictionary does not express the same feeling of a Turkish word. In addition, existing emotional dictionaries are not developed for specifically for the social networks with emoticons. OBJECTIVE This research presents the Turkish Happiness Index (THI) which is a set of psychological normative happiness scores to measure an average level of happiness of Turkish online unstructured large-scale data. A well-being informatics analytics research is also done by using THI. METHODS Turkish Happiness Index was completely generated on social networks. 20000 words were extracted with web text mining from social networks. Natural Language Processing algorithms were applied. After data reduction quantitative research methodology is applied. The happiness scores were based detected based on 667 participants’ subjective happiness levels and their thoughts about the 1874 Turkish words. Alexithymia scale was also used to identify the emotional awareness of the participants. The evaluations of the words were done in the dimension of valence using the Self-Assessment Manikin in an online platform. NLP was used to measure online Turkish happiness of data. Data was collected from Facebook with negative #war and positive #family hashtags in a duration of one month using a 3rd party software tool. Natural language processing algorithms including tokenization, transformation, filtering and stemming after converting data to documents. The happiness levels of the documents based on hashtags were determined using the Turkish Happiness Index dictionary. RESULTS THI which contains 345 words and their happiness scores in the Turkish language was developed. The THI is given in Appendix 1. We also put a comparison between words of dictionaries to understand the cultural differences. CONCLUSIONS THI provide researchers with standard materials through which they can automatically measure online happiness of Turkish large-scale data. THI can be used in in real-time big data analytics.


2021 ◽  
Author(s):  
Xinxu Shen ◽  
Troy Houser ◽  
David Victor Smith ◽  
Vishnu P. Murty

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.


2020 ◽  
Vol 2 (4) ◽  
pp. 209-215
Author(s):  
Eriss Eisa Babikir Adam

The computer system is developing the model for speech synthesis of various aspects for natural language processing. The speech synthesis explores by articulatory, formant and concatenate synthesis. These techniques lead more aperiodic distortion and give exponentially increasing error rate during process of the system. Recently, advances on speech synthesis are tremendously moves towards deep learning process in order to achieve better performance. Due to leverage of large scale data gives effective feature representations to speech synthesis. The main objective of this research article is that implements deep learning techniques into speech synthesis and compares the performance in terms of aperiodic distortion with prior model of algorithms in natural language processing.


2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


10.29007/pc58 ◽  
2018 ◽  
Author(s):  
Julia Lavid ◽  
Marta Carretero ◽  
Juan Rafael Zamorano

In this paper we set forth an annotation model for dynamic modality in English and Spanish, given its relevance not only for contrastive linguistic purposes, but also for its impact on practical annotation tasks in the Natural Language Processing (NLP) community. An annotation scheme is proposed, which captures both the functional-semantic meanings and the language-specific realisations of dynamic meanings in both languages. The scheme is validated through a reliability study performed on a randomly selected set of one hundred and twenty sentences from the MULTINOT corpus, resulting in a high degree of inter-annotator agreement. We discuss our main findings and give attention to the difficult cases as they are currently being used to develop detailed guidelines for the large-scale annotation of dynamic modality in English and Spanish.


Sign in / Sign up

Export Citation Format

Share Document