Skrelin Method for Constructing Formants for Studying Phonetic Characteristics of Vowels

This article presents the results of applying method for obtaining formant components of vowel phonemes for the corpus of professional reading in Russian. In this paper, a review of existing areas of development of methods for obtaining formant characteristics of vowels for different languages was made. A review was also made of the extent to which formant picture patterns are used in speech technologies and natural language processing. On the corpus of professional reading CORPRES, data was obtained on formant components for 351929 realizations of vowel phonemes on the material of 8 speakers. The data obtained are grouped in accordance with the symbols in the real transcription, which was performed by phoneticians within the framework of segmenting the corpus. The formant planes represent the distribution of allophones of vowels for all speakers according to the two first formants. The variability of formant characteristics in the corpus for pre-tonic and post-tonic allophones are presented for one male speaker. The article also presents the results testifying the difference between the rounded unstressed /i/ and /a/, which are perceived by both naive speakers and expert phoneticians as /u/. As an experimental material, the recordings of reading by one male announcer of specially selected sentences, which took into account various linguistic factors, were used. Analysis of the data of the formant components of these vowels showed that the values of the first formant of these vowels are close to the values of the stressed vowel /u/ for this speaker. The closure of these vowels corresponds to the closure of /u/. The second formant values in the vowels [u], which were to be realized as [i] and [a] are different. They are more advanced in comparison with /u/.

Download Full-text

Exact Expected Average Precision of the Random Baseline for System Evaluation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2015-0007 ◽

2015 ◽

Vol 103 (1) ◽

pp. 131-138 ◽

Cited By ~ 2

Author(s):

Yves Bestgen

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

System Evaluation ◽

Average Precision ◽

The Difference

Abstract Average precision (AP) is one of the most widely used metrics in information retrieval and natural language processing research. It is usually thought that the expected AP of a system that ranks documents randomly is equal to the proportion of relevant documents in the collection. This paper shows that this value is only approximate, and provides a procedure for efficiently computing the exact value. An analysis of the difference between the approximate and the exact value shows that the discrepancy is large when the collection contains few documents, but becomes very small when it contains at least 600 documents.

Download Full-text

Forecasting Net Income Estimate and Stock Price Using Text Mining from Economic Reports

Information ◽

10.3390/info11060292 ◽

2020 ◽

Vol 11 (6) ◽

pp. 292 ◽

Cited By ~ 1

Author(s):

Masahiro Suzuki ◽

Hiroki Sakaji ◽

Kiyoshi Izumi ◽

Hiroyasu Matsushima ◽

Yasushi Ishikawa

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Stock Prices ◽

Stock Price ◽

Net Income ◽

The Difference

This paper proposes and analyzes a methodology of forecasting movements of the analysts’ net income estimates and those of stock prices. We achieve this by applying natural language processing and neural networks in the context of analyst reports. In the pre-experiment, we applied our method to extract opinion sentences from the analyst report while classifying the remaining parts as non-opinion sentences. Then, we performed two additional experiments. First, we employed our proposed method for forecasting the movements of analysts’ net income estimates by inputting the opinion and non-opinion sentences into separate neural networks. Besides the reports, we inputted the trend of the net income estimate to the networks. Second, we employed our proposed method for forecasting the movements of stock prices. Consequently, we found differences between security firms, which depend on whether analysts’ net income estimates tend to be forecasted by opinions or facts in the context of analyst reports. Furthermore, the trend of the net income estimate was found to be effective for the forecast as well as an analyst report. However, in experiments of forecasting movements of stock prices, the difference between opinion sentences and non-opinion sentences was not effective.

Download Full-text

XenC: An Open-Source Tool for Data Selection in Natural Language Processing

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2013-0013 ◽

2013 ◽

Vol 100 (1) ◽

pp. 73-82 ◽

Cited By ~ 8

Author(s):

Anthony Rousseau

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Open Source ◽

Language Processing ◽

Statistical Machine Translation ◽

Data Selection ◽

Specific Domain ◽

Open Source Tool ◽

The Difference ◽

Selection Of

Abstract In this paper we describe XenC, an open-source tool for data selection aimed at Natural Language Processing (NLP) in general and Statistical Machine Translation (SMT) or Automatic Speech Recognition (ASR) in particular. Usually, when building a SMT or ASR system, the considered task is related to a specific domain of application, like news articles or scientific talks for instance. The goal of XenC is to allow selection of relevant data regarding the considered task, which will be used to build the statistical models for such a system. It is done by computing the difference between cross-entropy scores of sentences from a large out-of-domain corpus and sentences from a corpus considered as in-domain for the task. Written in C++, this tool can operate on monolingual or bilingual data and is language-independent. XenC, now part of the LIUM toolchain for SMT, is actively developed since December 2011 and used in many MT projects.

Download Full-text

An Improved Semidiscrete Matrix Decomposition and its Application in Chinese Information Retrieval

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.3121 ◽

2012 ◽

Vol 241-244 ◽

pp. 3121-3124 ◽

Cited By ~ 1

Author(s):

Yang Luo

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Matrix Decomposition ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Storage Space ◽

Important Direction ◽

The Difference

Information retrieval is an important direction in the area of natural language processing .This paper introduced semidiscrete matrix decomposition in latent semantic indexing. We aimed at it’s disadvantage in storage space and presented SSDD,then we compare the difference of SVD and SDD and SSDD in performance

Download Full-text

Analysis on the Language Independent and Dependent Aspects of Deep Learning based Question Answering Systems

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4816.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2057-2062

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Computational Models ◽

Question Answering ◽

Natural Languages ◽

Learning Network ◽

The Difference ◽

Deep Learning Network

Natural languages are ambiguous and computers are not capable of understanding the natural languages in the way people really understand them. Natural Language Processing (NLP) is concerned with the development of computational models based on the aspects of human language processing. Question Answering (QA) system is a field of Natural Language Processing that provides precise answer for the user question which is given in natural language. In this work, a MemN2N model based question answering system is implemented and its performance is evaluated with a complex question answering tasks using bAbI dataset of three different language text corpuses. The scope of this work is to understand the language independent and dependant aspects of a deep learning network. For this, we will study the performance of the deep learning network by training and testing it with different kinds of question answering tasks with different languages and also try to understand the difference in performance with respect to the languages

Download Full-text

Natural Language Processing Systems for Diagnosing and Determining Level of Lung Cancer: A Systematic Review

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.264 ◽

2021 ◽

Vol 10 (1) ◽

pp. 68

Author(s):

Mahdieh Montazeri ◽

Ali Afraz ◽

Raheleh Mahboob Farimani ◽

Fahimeh Ghasemian

Keyword(s):

Lung Cancer ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Lung Cancer Patients ◽

The Difference ◽

Combination Of Methods ◽

Manual Extraction

Introduction: Lung cancer is the second most common cancer for men and women. Using natural language processing to automatically extract information from text, lead to decrease labor of manual extraction from large volume of text material and save time. The aim of this study is to systematically review of studies which reviewed NLP methods in diagnosing and staging lung cancer.Material and Methods: PubMed, Scopus, Web of science, Embase was searched for English language articles that reported diagnosing and staging methods in lung cancer Using NLP until DEC 2019. Two reviewers independently assessed original papers to determine eligibility for inclusion in the review.Results: Of 119 studies, 7 studies were included. Three studies developed a NLP algorithm to scan radiology notes and determine the presence or absence of nodules to identify patients with incident lung nodules for treatment or follow-up. Two studies used NLP to transform the report text, including identification of UMLS terms and detection of negated findings to classifying reports, also one of them used an SVM-based text classification system for staging lung cancer patients. All studies reported various performance measures based on the difference between combination of methods. Most of studies have reported sensitivity and specificity of the NLP algorithm for identifying the presence of lung nodules.Conclusion: Evaluation of studies in diagnosing and staging methods in lung cancer using NLP shows there is a number of studies on diagnosing lung cancer but there are a few works on staging that. In some studies, combination of methods was considered and NLP isolated was not sufficient for capturing satisfying results. There are potentials to improve studies by adding other data sources, further refinement and subsequent validation.

Download Full-text

Natural Language Processing and Enhanced Clinical Decision Making Radiology and VINCI

PsycEXTRA Dataset ◽

10.1037/e615572012-015 ◽

2012 ◽

Author(s):

Eliot Siegel

Keyword(s):

Decision Making ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Decision Making ◽

Clinical Decision

Download Full-text

Natural Language Processing in the Clinical Setting

PsycEXTRA Dataset ◽

10.1037/e615572012-013 ◽

2012 ◽

Author(s):

Thomas H. Payne

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Setting

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text