Can syllabification improve pronunciation by analogy of English?

2006 ◽  
Vol 13 (1) ◽  
pp. 1-24 ◽  
Author(s):  
YANNICK MARCHAND ◽  
ROBERT I. DAMPER

In spite of difficulty in defining the syllable unequivocally, and controversy over its role in theories of spoken and written language processing, the syllable is a potentially useful unit in several practical tasks which arise in computational linguistics and speech technology. For instance, syllable structure might embody valuable information for building word models in automatic speech recognition, and concatenative speech synthesis might use syllables or demisyllables as basic units. In this paper, we first present an algorithm for determining syllable boundaries in the orthographic form of unknown words that works by analogical reasoning from a database or corpus of known syllabifications. We call this syllabification by analogy (SbA). It is similarly motivated to our existing pronunciation by analogy (PbA) which predicts pronunciations for unknown words (specified by their spellings) by inference from a dictionary of known word spellings and corresponding pronunciations. We show that including perfect (according to the corpus) syllable boundary information in the orthographic input can dramatically improve the performance of pronunciation by analogy of English words, but such information would not be available to a practical system. So we next investigate combining automatically-inferred syllabification and pronunciation in two different ways: the series model in which syllabification is followed sequentially by pronunciation generation; and the parallel model in which syllabification and pronunciation are simultaneously inferred. Unfortunately, neither improves performance over PbA without syllabification. Possible reasons for this failure are explored via an analysis of syllabification and pronunciation errors.

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Dia AbuZeina ◽  
Taqieddin Mostafa Abdalbaset

The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however, this work experimentally evaluates just one that is the VB ≡ imperative verb. The testing set contains 741 imperative verbs, which appear in 1,848 positions in the Holy Quran. Despite the previously reported accuracy of the Arabic model of the Stanford tagger, which is 96.26% for all tags and 80.14% for unknown words, the experimental results show that this accuracy is only 7.28% for the imperative verbs. This result promotes the need for further research to expose why the tagging is severely inaccurate for classical Arabic. The performance decline might be an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks.


2020 ◽  
Author(s):  
Mario Crespo Miguel

Computational linguistics is the scientific study of language from a computational perspective. It aims is to provide computational models of natural language processing (NLP) and incorporate them into practical applications such as speech synthesis, speech recognition, automatic translation and many others where automatic processing of language is required. The use of good linguistic resources is crucial for the development of computational linguistics systems. Real world applications need resources which systematize the way linguistic information is structured in a certain language. There is a continuous effort to increase the number of linguistic resources available for the linguistic and NLP Community. Most of the existing linguistic resources have been created for English, mainly because most modern approaches to computational lexical semantics emerged in the United States. This situation is changing over time and some of these projects have been subsequently extended to other languages; however, in all cases, much time and effort need to be invested in creating such resources. Because of this, one of the main purposes of this work is to investigate the possibility of extending these resources to other languages such as Spanish. In this work, we introduce some of the most important resources devoted to lexical semantics, such as WordNet or FrameNet, and those focusing on Spanish such as 3LB-LEX or Adesse. Of these, this project focuses on FrameNet. The project aims to document the range of semantic and syntactic combinatory possibilities of words in English. Words are grouped according to the different frames or situations evoked by their meaning. If we focus on a particular topic domain like medicine and we try to describe it in terms of FrameNet, we probably would obtain frames representing it like CURE, formed by words like cure.v, heal.v or palliative.a or MEDICAL CONDITIONS with lexical units such as arthritis.n, asphyxia.n or asthma.n. The purpose of this work is to develop an automatic means of selecting frames from a particular domain and to translate them into Spanish. As we have stated, we will focus on medicine. The selection of the medical frames will be corpus-based, that is, we will extract all the frames that are statistically significant from a representative corpus. We will discuss why using a corpus-based approach is a reliable and unbiased way of dealing with this task. We will present an automatic method for the selection of FrameNet frames and, in order to make sure that the results obtained are coherent, we will contrast them with a previous manual selection or benchmark. Outcomes will be analysed by using the F-score, a measure widely used in this type of applications. We obtained a 0.87 F-score according to our benchmark, which demonstrates the applicability of this type of automatic approaches. The second part of the book is devoted to the translation of this selection into Spanish. The translation will be made using EuroWordNet, a extension of the Princeton WordNet for some European languages. We will explore different ways to link the different units of our medical FrameNet selection to a certain WordNet synset or set of words that have similar meanings. Matching the frame units to a specific synset in EuroWordNet allows us both to translate them into Spanish and to add new terms provided by WordNet into FrameNet. The results show how translation can be done quite accurately (95.6%). We hope this work can add new insight into the field of natural language processing.


1999 ◽  
Vol 5 (1) ◽  
pp. 95-112 ◽  
Author(s):  
THOMAS BUB ◽  
JOHANNES SCHWINN

Verbmobil represents a new generation of speech-to-speech translation systems in which spontaneously spoken language, speaker independence and adaptability as well as the combination of deep and shallow approaches to the analysis and transfer problems are the main features. The project brought together researchers from the fields of signal processing, computational linguistics and artificial intelligence. Verbmobil goes beyond the state-of-the-art in each of these areas, but its main achievement is the seamless integration of them. The first project phase (1993–1996) has been followed up by the second project phase (1997–2000), which aims at applying the results to further languages and at integrating innovative telecooperation techniques. Quite apart from the speech and language processing issues, the size and complexity of the project represent an extreme challenge on the areas of project management and software engineering:[bull ] 50 researchers from 29 organizations at different sites in different countries are involved in the software development process,[bull ] to reuse existing software, hardware, knowledge and experience, only a few technical restrictions could be given to the partners.In this article we describe the Verbmobil prototype system from a software-engineering perspective. We discuss:[bull ] the modularized functional architecture,[bull ] the flexible and extensible software architecture which reflects that functional architecture,[bull ] the evolutionary process of system integration,[bull ] the communication-based organizational structure of the project,[bull ] the evaluation of the system operational by the end of the first project phase.


1995 ◽  
Vol 18 (2) ◽  
pp. 141-158 ◽  
Author(s):  
Marshall H. Raskind ◽  
Eleanor Higgins

This study investigated the effects of speech synthesis on the proofreading efficiency of postsecondary students with learning disabilities. Subjects proofread self-generated written language samples under three conditions: (a) using a speech synthesis system that simultaneously highlighted and “spoke” words on a computer monitor, (b) having the text read aloud to them by another person, and (c) receiving no assistance. Using the speech synthesis system enabled subjects to detect a significantly higher percentage of total errors than either of the other two proofreading conditions. In addition, subjects were able to locate a significantly higher percentage of capitalization, spelling, usage and typographical errors under the speech synthesis condition. However, having the text read aloud by another person significantly outperformed the other conditions in finding “grammar-mechanical” errors. Results are discussed with regard to underlying reasons for the overall superior performance of the speech synthesis system and the implications of using speech synthesis as a compensatory writing aid for postsecondary students with learning disabilities.


Author(s):  
Mans Hulden

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.


Webology ◽  
2021 ◽  
Vol 18 (1) ◽  
pp. 389-405
Author(s):  
Rahmad Agus Dwianto ◽  
Achmad Nurmandi ◽  
Salahudin Salahudin

As Covid-19 spreads to other nations and governments attempt to minimize its effect by introducing countermeasures, individuals have often used social media outlets to share their opinions on the measures themselves, the leaders implementing them, and the ways in which their lives are shifting. Sentiment analysis refers to the application in source materials of natural language processing, computational linguistics, and text analytics to identify and classify subjective opinions. The reason why this research uses a sentiment case study towards Trump and Jokowi's policies is because Jokowi and Trump have similarities in handling Covid-19. Indonesia and the US are still low in the discipline in implementing health protocols. The data collection period was chosen on September 21 - October 21 2020 because during that period, the top 5 trending on Twitter included # covid19, #jokowi, #miglobal, #trump, and #donaldtrump. So, this period is most appropriate for taking data and discussing the handling of Covid-19 by Jokowi and Trump. The result shows both Jokowi and Trump have higher negative sentiments than positive sentiments during the period. Trump had issued a controversial statement regarding the handling of Covid-19. This research is limited to the sentiment generated by the policies conveyed by the US and Indonesian Governments via @jokowi and @realDonaldTrump Twitter Account. The dataset presented in this research is being collected and analyzed using the Brand24, a software-automated sentiment analysis. Further research can increase the scope of the data and increase the timeframe for data collection and develop tools for analyzing sentiment.


Author(s):  
Vinod Kumar Mishra ◽  
Himanshu Tiruwa

Sentiment analysis is a part of computational linguistics concerned with extracting sentiment and emotion from text. It is also considered as a task of natural language processing and data mining. Sentiment analysis mainly concentrate on identifying whether a given text is subjective or objective and if it is subjective, then whether it is negative, positive or neutral. This chapter provide an overview of aspect based sentiment analysis with current and future trend of research on aspect based sentiment analysis. This chapter also provide a aspect based sentiment analysis of online customer reviews of Nokia 6600. To perform aspect based classification we are using lexical approach on eclipse platform which classify the review as a positive, negative or neutral on the basis of features of product. The Sentiwordnet is used as a lexical resource to calculate the overall sentiment score of each sentence, pos tagger is used for part of speech tagging, frequency based method is used for extraction of the aspects/features and used negation handling for improving the accuracy of the system.


Author(s):  
Ayush Srivastav ◽  
Hera Khan ◽  
Amit Kumar Mishra

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.


Sign in / Sign up

Export Citation Format

Share Document