Information Extraction from Research Papers Based on Statistical Methods

Author(s):  
Selvani Deepthi Kavila ◽  
D. Fathima Rani
Foods ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 577
Author(s):  
Olga Escuredo ◽  
M. Carmen Seijo

This Special Issue contains innovative research papers on the characterization, chemical composition and physical properties of honey. This constitutes very useful information to avoid frauds and to guarantee the authenticity of this food product. The knowledge of the particularities of honey is increasingly demanded by beekeepers and consumers, and also by labs to typify honeys according to their botanical origin and to check their quality. Melissopalynological, sensorial and physicochemical techniques are being used to study the characteristics of honeys samples from different plant sources and geographical areas. The combination of these analytical techniques with mathematical and statistical methods or chemometrics allows researchers to identify a set of variables or individual parameters that define independent samples, providing a practical solution to classify honey according to the geographical or the botanical origin.


2014 ◽  
Vol 32 (2) ◽  
pp. 276-284 ◽  
Author(s):  
Ping Bao ◽  
Suoling Zhu

Purpose – The purpose of this paper is to present a system for recognition of location names in ancient books written in languages, such as Chinese, in which proper names are not signaled by an initial capital letter. Design/methodology/approach – Rule-based and statistical methods were combined to develop a set of rules for identification of product-related location names in the local chronicles of Guangdong. A name recognition system, with functions of document management, information extraction and storage, rule management, location name recognition, and inquiry and statistics, was developed using Microsoft's .NET framework, SQL Server 2005, ADO.NET and XML. The system was evaluated with precision ratio, recall ratio and the comprehensive index, F. Findings – The system was quite successful at recognizing product-related location names (F was 71.8 percent), demonstrating the potential for application of automatic named entity recognition techniques in digital collation of ancient books such as local chronicles. Research limitations/implications – Results suffered from limitations in initial digitization of the text. Statistical methods, such as the hidden Markov model, should be combined with an extended set of recognition rules to improve recognition scores and system efficiency. Practical implications – Electronic access to local chronicles by location name saves time for chorographers and provides researchers with new opportunities. Social implications – Named entity recognition brings previously isolated ancient documents together in a knowledge base of scholarly and cultural value. Originality/value – Automatic name recognition can be implemented in information extraction from ancient books in languages other than English. The system described here can also be adapted to modern texts and other named entities.


2015 ◽  
Vol 6 ◽  
pp. 1872-1882 ◽  
Author(s):  
Thaer M Dieb ◽  
Masaharu Yoshioka ◽  
Shinjiro Hara ◽  
Marcus C Newton

To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.


2018 ◽  
Vol 2 (2) ◽  
pp. 103-120
Author(s):  
Jin Zhang ◽  
Yanyan Wang ◽  
Yuehua Zhao ◽  
Xin Cai

AbstractResearch methods play an extremely important role in studies. Statistical methods are fundamental and vital for quantitative research. The authors of this paper investigated the research papers that used statistical methods including parametric inferential statistical methods, nonparametric inferential statistical methods, predictive statistical correlation methods, and predictive statistical regression methods in library and information science and examined the connections and interactions between statistical methods and their application areas including information creation, information selection and control, information organization, information retrieval, information dissemination, and information use. Both an inferential statistical method and graphic clustering visualization method were employed to explore the relationships between statistical methods and application areas and reveal the hidden interaction patterns. As a result, 1821 research papers employing statistical methods were identified among the papers published in six major library and information science journals from 1999 to 2017. The findings showed that application areas affected the types of statistical methods utilized. Studies in information organization and information retrieval tended to employ parametric and nonparametric inferential methods, while correlation and regression methods were applied more in studies in information use, information dissemination, information creation, and information selection and control field. These findings help researchers better understand the statistical method orientation of library and information science studies and assist educators in the field to develop applicable quantitative research methodology courses.


Sign in / Sign up

Export Citation Format

Share Document