DOMAIN SPECIFIC KEY FEATURE EXTRACTION USING KNOWLEDGE GRAPH MINING

In the field of text mining, many novel feature extraction approaches have been propounded. The following research paper is based on a novel feature extraction algorithm. In this paper, to formulate this approach, a weighted graph mining has been used to ensure the effectiveness of the feature extraction and computational efficiency; only the most effective graphs representing the maximum number of triangles based on a predefined relational criterion have been considered. The proposed novel technique is an amalgamation of the relation between words surrounding an aspect of the product and the lexicon-based connection among those words, which creates a relational triangle. A maximum number of a triangle covering an element has been accounted as a prime feature. The proposed algorithm performs more than three times better than TF-IDF within a limited set of data in analysis based on domain-specific data. Keywords: feature extraction, natural language processing, product review, text processing, knowledge graph.

Download Full-text

AKMiner: Domain-Specific Knowledge Graph Mining from Academic Literatures

Lecture Notes in Computer Science - Web Information Systems Engineering – WISE 2013 ◽

10.1007/978-3-642-41154-0_18 ◽

2013 ◽

pp. 241-255 ◽

Cited By ~ 2

Author(s):

Shanshan Huang ◽

Xiaojun Wan

Keyword(s):

Graph Mining ◽

Knowledge Graph ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge

Download Full-text

The Feature Extraction Algorithm for the Production of Emotions in Text-to-Speech (TTS) System for an Indian Regional Language

Pervasive Computing for Business ◽

10.4018/978-1-60566-996-0.ch003 ◽

2010 ◽

pp. 17-30

Author(s):

Jagadish S Kallimani ◽

V. K Ananthashayana ◽

Debjani Goswami

Keyword(s):

Feature Extraction ◽

Language Processing ◽

Speech Synthesis ◽

Text To Speech ◽

Everyday Objects ◽

Regional Language ◽

Extraction Algorithm ◽

Processing Signal ◽

Emotional Aspects ◽

Text To Speech Synthesis

Text-to-speech synthesis is a complex combination of language processing, signal processing and computer science. Ubiquitous computing (ubicomp) is a post-desktop model of human-computer interaction in which information processing has been thoroughly integrated into everyday objects and activities. Speech synthesis is the generation of synthesized speech from text. This chapter deals with the development of a Text to Speech (TTS) Synthesis system for an Indian regional language by considering Bengali as the language. This chapter highlights various methods which may be used for speech synthesis and also it provides an overview on the problems and difficulties in Bengali text to speech conversion. Variations in the prosody (speech parameters – volume, pitch, intonation, amplitude) of the speech yields the emotional aspects (anger, happy, normal), which are applied to our developed TTS system.

Download Full-text

Seq2KG: An End-to-End Neural Model for Domain Agnostic Knowledge Graph (not Text Graph) Construction from Text

Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning ◽

10.24963/kr.2020/77 ◽

2020 ◽

Author(s):

Michael Stewart ◽

Wei Liu

Keyword(s):

Language Processing ◽

Neural Model ◽

Knowledge Bases ◽

Knowledge Graph ◽

Annotation Scheme ◽

Domain Specific ◽

Wide Range ◽

End To End ◽

True Knowledge ◽

Evaluation Metric

Knowledge Graph Construction (KGC) from text unlocks information held within unstructured text and is critical to a wide range of downstream applications. General approaches to KGC from text are heavily reliant on the existence of knowledge bases, yet most domains do not even have an external knowledge base readily available. In many situations this results in information loss as a wealth of key information is held within "non-entities". Domain-specific approaches to KGC typically adopt unsupervised pipelines, using carefully crafted linguistic and statistical patterns to extract co-occurred noun phrases as triples, essentially constructing text graphs rather than true knowledge graphs. In this research, for the first time, in the same flavour as Collobert et al.'s seminal work of "Natural language processing (almost) from scratch" in 2011, we propose a Seq2KG model attempting to achieve "Knowledge graph construction (almost) from scratch". An end-to-end Sequence to Knowledge Graph (Seq2KG) neural model jointly learns to generate triples and resolves entity types as a multi-label classification task through deep learning neural networks. In addition, a novel evaluation metric that takes both semantic and structural closeness into account is developed for measuring the performance of triple extraction. We show that our end-to-end Seq2KG model performs on par with a state of the art rule-based system which outperformed other neural models and won the first prize of the first Knowledge Graph Contest in 2019. A new annotation scheme and three high-quality manually annotated datasets are available to help promote this direction of research.

Download Full-text

Text pre-processing of multilingual for sentiment analysis based on social network data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i1.pp776-784 ◽

2022 ◽

Vol 12 (1) ◽

pp. 776

Author(s):

Neha Garg ◽

Kamlesh Sharma

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Text Processing ◽

Problem Definition ◽

Domain Specific ◽

Twitter Data ◽

Stop Word ◽

Processing Techniques ◽

The Impact

<span>Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text pre-processing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.</span>

Download Full-text

Iris feature extraction algorithm based on odd symmetry 2D Log-Gabor

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00976 ◽

2009 ◽

Vol 29 (4) ◽

pp. 976-978 ◽

Cited By ~ 1

Author(s):

Lin-tao Lü ◽

Tao YANG

Keyword(s):

Feature Extraction ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Iris Feature Extraction ◽

Log Gabor

Download Full-text

Corresponding Feature Extraction Algorithm between Infrared and Visible Images Using MSER

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2010.01111 ◽

2011 ◽

Vol 33 (7) ◽

pp. 1625-1631 ◽

Cited By ~ 1

Author(s):

Lin Lian ◽

Guo-hui Li ◽

Hai-tao Wang ◽

hao Tian ◽

Shu-kui Xu

Keyword(s):

Feature Extraction ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Visible Images

Download Full-text

Image-based 3D Building Reconstruction Using A-KAZE Feature Extraction Algorithm

Proceedings of the 35th International Symposium on Automation and Robotics in Construction (ISARC) ◽

10.22260/isarc2018/0127 ◽

2018 ◽

Author(s):

Hyeonwoo Seong ◽

Hyunchul Choi ◽

Hyojoo Son ◽

Changwan Kim

Keyword(s):

Feature Extraction ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Kaze Feature ◽

Building Reconstruction

Download Full-text

Indentation Mark Feature Extraction Algorithm Based on Local Gradient Directional Ternary Pattern and CNN

Proceedings of the 2019 International Symposium on Signal Processing Systems - SSPS 2019 ◽

10.1145/3364908.3365299 ◽

2019 ◽

Author(s):

Haitao Dong ◽

Ying Liu ◽

Fuping Wang ◽

Keng Pang Lim

Keyword(s):

Feature Extraction ◽

Indentation Mark ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Local Gradient

Download Full-text

Shall I Work with Them? A Knowledge Graph-Based Approach for Predicting Future Research Collaborations

Entropy ◽

10.3390/e23060664 ◽

2021 ◽

Vol 23 (6) ◽

pp. 664

Author(s):

Nikos Kanakaris ◽

Nikolaos Giarelis ◽

Ilias Siachos ◽

Nikos Karacapilidis

Keyword(s):

Language Processing ◽

Scientific Knowledge ◽

Link Prediction ◽

Performance Metrics ◽

Future Research ◽

Knowledge Graph ◽

Prediction Problem ◽

Textual Information ◽

Research Collaborations ◽

Processing Techniques

We consider the prediction of future research collaborations as a link prediction problem applied on a scientific knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing techniques. Our work: (i) investigates whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model, (ii) studies the effect of previously proposed graph kernels based approaches on the performance of an ML model, as far as the link prediction problem is concerned, and (iii) proposes a three-phase pipeline that enables the exploitation of structural and textual information, as well as of pre-trained word embeddings. We benchmark the proposed approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Finally, we empirically test our approach through various feature combinations with respect to the link prediction problem. Our experimentations with the new COVID-19 Open Research Dataset demonstrate a significant improvement of the abovementioned performance metrics in the prediction of future research collaborations.

Download Full-text

Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01495-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Pilar López-Úbeda ◽

Alexandra Pomares-Quimbaya ◽

Manuel Carlos Díaz-Galiano ◽

Stefan Schulz

Keyword(s):

Language Processing ◽

Classification Problem ◽

Snomed Ct ◽

Language Resources ◽

Clinical Specialty ◽

Controlled Vocabularies ◽

Clinical Text ◽

Domain Specific ◽

Medical Terms ◽

Core Vocabulary

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.

Download Full-text