Rule based Text Extraction from a Bibliographic Database

2018 ◽  
Vol 38 (1) ◽  
pp. 5
Author(s):  
Veena Makhija ◽  
Swapnil Ahuja

<p>The emergent concept of ‘ Big Data’ has shifted the paradigm from information retrieval to information extraction techniques. The information extraction techniques enables corpus analysis to draw useful interpretations and its possible applications. Selection of appropriate information extraction technique depends upon the type of data being dealt with and its possible applications. In an R&amp;D environment, the published information is considered as an authenticated benchmark to study and analyse the growth pattern in that field of science, medicine, business. A rule based information extraction process, on the selected data extracted from a bibliographic database of published R&amp;D papers is proposed in this paper. Aim of the study is to build up a database on relevant concepts, cleaning of retrieved data and automate the process of information retrieval in the local database. For this purpose, a concept based ‘subject profiles’ in the area of advanced semiconductors as well as the rules for text extraction from metadata retrieved from the bibliographic database was developed. This subset was used as an input to the knowledge domain to support R&amp;D in the area of ‘advanced semiconductor materials and devices’ and provide information services on Intranet. Study found that concept based pattern matching on the datasets downloaded yielded better results as compared to the results by using the controlled vocabulary of the source database .</p>


The movement of internet technology enhanced the speed and accuracy of data retrieval over the internet. The retrieval of data over the internet needs some automatic process of information extraction and query retrieval. The information extraction gives the process of the predefined structure of the concept to a particular domain of knowledge. The process of information extraction proceeds in two steps one is preprocessing of data and post-processing of data. In preprocessing of data used the concept of the glowworm optimization algorithm. The glowworm algorithm is a family of kits a gives the better selection of information in constraints of similarity. The selection of similarity based on the process of lubrification. The optimization of glowworm removed the unwanted noise of data and filtered it. For the extraction of information used ensemblebased information extraction. The ensemble-based information extraction proceeds with constraints function that function is called mapper constraints. The mapper constraints map the process of ontology with guided domain ontology. The ensemblebased information extraction process used the concept of machine learning for the binding of process. The goals of this work are the development of an OBIE for the domain of different fields of data retrieval such as news agencies, hotel industries and sports. The proposed model combines with the use of ontology, POS and language processing tools and constraintsbased mapper with domain ontology.



2014 ◽  
Vol 22 (1) ◽  
pp. 1-40 ◽  
Author(s):  
PETER KLUEGL ◽  
MARTIN TOEPFER ◽  
PHILIP-DANIEL BECK ◽  
GEORG FETTE ◽  
FRANK PUPPE

AbstractRule-based information extraction is an important approach for processing the increasingly available amount of unstructured data. The manual creation of rule-based applications is a time-consuming and tedious task, which requires qualified knowledge engineers. The costs of this process can be reduced by providing a suitable rule language and extensive tooling support. This paper presents UIMA Ruta, a tool for rule-based information extraction and text processing applications. The system was designed with focus on rapid development. The rule language and its matching paradigm facilitate the quick specification of comprehensible extraction knowledge. They support a compact representation while still providing a high level of expressiveness. These advantages are supplemented by the development environment UIMA Ruta Workbench. It provides, in addition to extensive editing support, essential assistance for explanation of rule execution, introspection, automatic validation, and rule induction. UIMA Ruta is a useful tool for academia and industry due to its open source license. We compare UIMA Ruta to related rule-based systems especially concerning the compactness of the rule representation, the expressiveness, and the provided tooling support. The competitiveness of the runtime performance is shown in relation to a popular and freely-available system. A selection of case studies implemented with UIMA Ruta illustrates the usefulness of the system in real-world scenarios.



2020 ◽  
pp. 102986492097216
Author(s):  
Gaelen Thomas Dickson ◽  
Emery Schubert

Background: Music is thought to be beneficial as a sleep aid. However, little research has explicitly investigated the specific characteristics of music that aid sleep and some researchers assume that music described as generically sedative (slow, with low rhythmic activity) is necessarily conducive to sleep, without directly interrogating this assumption. This study aimed to ascertain the features of music that aid sleep. Method: As part of an online survey, 161 students reported the pieces of music they had used to aid sleep, successfully or unsuccessfully. The participants reported 167 pieces, some more often than others. Nine features of the pieces were analyzed using a combination of music information retrieval methods and aural analysis. Results: Of the pieces reported by participants, 78% were successful in aiding sleep. The features they had in common were that (a) their main frequency register was middle range frequencies; (b) their tempo was medium; (c) their articulation was legato; (d) they were in the major mode, and (e) lyrics were present. They differed from pieces that were unsuccessful in aiding sleep in that (a) their main frequency register was lower; (b) their articulation was legato, and (c) they excluded high rhythmic activity. Conclusion: Music that aids sleep is not necessarily sedative music, as defined in the literature, but some features of sedative music are associated with aiding sleep. In the present study, we identified the specific features of music that were reported to have been successful and unsuccessful in aiding sleep. The identification of these features has important implications for the selection of pieces of music used in research on sleep.



Metabolites ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 240
Author(s):  
Alison Woodward ◽  
Alina Pandele ◽  
Salah Abdelrazig ◽  
Catherine A. Ortori ◽  
Iqbal Khan ◽  
...  

The integration of untargeted metabolomics and transcriptomics from the same population of cells or tissue enhances the confidence in the identified metabolic pathways and understanding of the enzyme–metabolite relationship. Here, we optimised a simultaneous extraction method of metabolites/lipids and RNA from ependymoma cells (BXD-1425). Relative to established RNA (mirVana kit) or metabolite (sequential solvent addition and shaking) single extraction methods, four dual-extraction techniques were evaluated and compared (methanol:water:chloroform ratios): cryomill/mirVana (1:1:2); cryomill-wash/Econospin (5:1:2); rotation/phenol-chloroform (9:10:1); Sequential/mirVana (1:1:3). All methods extracted the same metabolites, yet rotation/phenol-chloroform did not extract lipids. Cryomill/mirVana and sequential/mirVana recovered the highest amounts of RNA, at 70 and 68% of that recovered with mirVana kit alone. sequential/mirVana, involving RNA extraction from the interphase of our established sequential solvent addition and shaking metabolomics-lipidomics extraction method, was the most efficient approach overall. Sequential/mirVana was applied to study a) the biological effect caused by acute serum starvation in BXD-1425 cells and b) primary ependymoma tumour tissue. We found (a) 64 differentially abundant metabolites and 28 differentially expressed metabolic genes, discovering four gene-metabolite interactions, and (b) all metabolites and 62% lipids were above the limit of detection, and RNA yield was sufficient for transcriptomics, in just 10 mg of tissue.



1984 ◽  
Vol 8 (2) ◽  
pp. 63-66 ◽  
Author(s):  
C.P.R. Dubois

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.



Sign in / Sign up

Export Citation Format

Share Document