Handbook of Research on Natural Language Processing and Smart Service Systems - Advances in Computational Intelligence and Robotics
Latest Publications


TOTAL DOCUMENTS

22
(FIVE YEARS 22)

H-INDEX

0
(FIVE YEARS 0)

Published By IGI Global

9781799847304, 9781799847311

Author(s):  
Karina Castro-Pérez ◽  
José Luis Sánchez-Cervantes ◽  
María del Pilar Salas-Zárate ◽  
Maritza Bustos-López ◽  
Lisbeth Rodríguez-Mazahua

In recent years, the application of opinion mining has increased as a boom and growth of social media and blogs on the web, and these sources generate a large volume of unstructured data; therefore, a manual review is not feasible. For this reason, it has become necessary to apply web scraping and opinion mining techniques, two primary processes that help to obtain and summarize the data. Opinion mining, among its various areas of application, stands out for its essential contribution in the context of healthcare, especially for pharmacovigilance, because it allows finding adverse drug events omitted by the pharmaceutical companies. This chapter proposes a hybrid approach that uses semantics and machine learning for an opinion mining-analysis system by applying natural-language-processing techniques for the detection of drug polarity for chronic-degenerative diseases, available in blogs and specialized websites in the Spanish language.


Author(s):  
Joaquín Pérez Ortega ◽  
Nelva Nely Almanza Ortega ◽  
Andrea Vega Villalobos ◽  
Marco A. Aguirre L. ◽  
Crispín Zavala Díaz ◽  
...  

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.


Author(s):  
Alexander Gelbukh ◽  
José A. Martínez F. ◽  
Andres Verastegui ◽  
Alberto Ochoa

In this chapter, an exhaustive parser is presented. The parser was developed to be used in a natural language interface to databases (NLIDB) project. This chapter includes a brief description of state-of-the-art NLIDBs, including a description of the methods used and the performance of some interfaces. Some of the general problems in natural language interfaces to databases are also explained. The exhaustive parser was developed, aiming at improving the overall performance of the interface; therefore, the interface is also briefly described. This chapter also presents the drawbacks discovered during the experimental tests of the parser, which show that it is unsuitable for improving the NLIDB performance.


Author(s):  
Juan Javier González-Barbosa ◽  
Juan Frausto Solís ◽  
Juan Paulo Sánchez-Hernández ◽  
Julia Patricia Sanchez-Solís

Databases and corpora are essential resources to evaluate the performance of Natural Language Interfaces to Databases (NLIDB). The Geobase database and the Geoquery corpus (Geoquery250 and Geoquery880) are among the most commonly used. In this chapter, the authors analyze both resources to offer two elaborate resources: 1) N-Geobase, which is a relational database, and 2) the corpus Geoquery270. The former follows the standard normalization procedure, then N-Geobase has a schema similar to enterprise databases. Geoquery270 consists of 270 queries selected from Geoquery880, preserving the same kind of natural language problems as Geoquery880, but with more challenging issues for an NLIDB than Geoquery250. To evaluate the new resources, they compared the performance of the NLIDB using Geoquery270 and Geoquery250. The results indicated that Geoquery270 was the harder corpus, while Geoquery250 is the easier one. Consequently, this chapter offers a broader range of resources to NLIDB designers.


Author(s):  
Sanah Nashir Sayyed ◽  
Namrata Mahender C.

Summarization is the process of selecting representative data to produce a reduced version of the given data with a minimal loss of information; so, it generally works on text, images, videos, and speech data. The chapter deals with not only concepts of text summarization (types, stages, issues, and criteria) but also with applications. The two main categories of approaches generally used in text summaries (i.e., abstractive and extractive) are discussed. Abstractive techniques use linguistic methods to interpret the text; they produce understandable and semantically equivalent sentences with a shorter length. Extractive techniques mostly rely on statistical methods for extracting essential sentences from the given text. In addition, the authors explore the SACAS model to exemplify the process of summarization. The SACAS system analyzed 50 stories, and its evaluation is presented in terms of a new measurement based on question-answering MOS, which is also introduced in this chapter.


Author(s):  
Rodolfo A. Pazos-Rangel ◽  
Gilberto Rivera ◽  
José A. Martínez F. ◽  
Juana Gaspar ◽  
Rogelio Florencia-Juárez

This chapter consists of an update of a previous publication. Specifically, the chapter aims at describing the most decisive advances in NLIDBs of this decade. Unlike many surveys on NLIDBs, for this chapter, the NLIDBs will be selected according to three relevance criteria: performance (i.e., percentage of correctly answered queries), soundness of the experimental evaluation, and the number of citations. To this end, the chapter will also include a brief review of the most widely used performance measures and query corpora for testing NLIDBs.


Author(s):  
Rafael Jiménez ◽  
Vicente García ◽  
Abraham López ◽  
Alejandra Mendoza Carreón ◽  
Alan Ponce

The Autonomous University of Ciudad Juárez performs an instructor evaluation each semester to find strengths, weaknesses, and areas of opportunity during the teaching process. In this chapter, the authors show how opinion mining can be useful for labeling student comments as positives and negatives. For this purpose, a database was created using real opinions obtained from five professors of the UACJ over the last four years, covering a total of 20 subjects. Natural language processing techniques were used on the database to normalize its data. Experimental results using 1-NN and Bagging classifiers shows that it is possible to automatically label positive and negative comments with an accuracy of 80.13%.


Author(s):  
Irvin Raul Lopez Contreras ◽  
Alejandra Mendoza Carreón ◽  
Jorge Rodas-Osollo ◽  
Martiza Concepción Varela

The quantity of information in the world is increasing every day on a fast level. This fact will be an obstacle in some situations; text summarization is involved in this kind of problem. It is used to minimize the time that people spend searching for information on the web and in a lot of digital documents. In this chapter, three algorithms were compared; all of them are an extractive text summarization algorithm. Popular libraries that influence the performance of these kinds of algorithms were used. It was necessary to configure and modify these methods so that they work for the Spanish language instead of their original one. The authors use some metrics found in the literature to evaluate the quality and performance of these algorithms.


Author(s):  
Carlos Manuel Ramirez López ◽  
Martín Montes Rivera ◽  
Alberto Ochoa ◽  
Julio César Ponce Gallegos ◽  
José Eder Guzmán Mendoza

This research presents the application of Empirical Bayesian Kriging, a geostatistical interpolation method. The case study is about suicide prevention. The dataset is composed of more than one million records, obtained from the report database of the Emergency Service 911 of the Mexican State of Aguascalientes. The purpose is to get prediction surfaces, probability, and standard error prediction for completed suicide cases. Here, the variations in the environment of suicide cases are relative to and dependent on economic, social, and cultural phenomena.


Author(s):  
Alonso García ◽  
Martha Victoria González ◽  
Francisco López-Orozco ◽  
Lucero Zamora

Recent technological advances have allowed the development of numerous natural language processing applications with which users frequently interact. When interacting with this type of application, users often search for the economy of words, which promotes the use of pronouns, thereby highlighting the well-known anaphora problem. This chapter describes a proposal to approach the pronominal anaphora for the Spanish language. A set of rules (based on the Eagle standard) was designed to identify the referents of personal pronouns through the structure of the grammatical tags of the words. The proposed algorithm uses the online Freeling service to perform tokenization and tagging tasks. The performance of the algorithm was compared with an online version of Freeling, and the proposed algorithm shows better performance.


Sign in / Sign up

Export Citation Format

Share Document