Keyword Extraction from Tweets using Graph-Based Methods

Social media refers to a set of different web sites like Twitter is a microblogging service that generates a huge amount of textual content daily. These methods based on text mining, natural language processing, and information retrieval are usually applied. The text mining approaches, documents are represented using the well-known vector space model, which results in sparse matrices to be dealt with computationally. A technique to extract keywords from collections of Twitter messages based on the representation of texts employing a graph structure, from which it is assigned relevance values to the vertices, based on graph centrality measures. The proposed approach, called TKG, relies on three phases: text pre-processing; graph building and keyword extraction. The first experiment applies TKG to a text from the Time magazine and compares its performance with TFID [1] and KEA[6], having human classifications as benchmarks. Finally, these algorithms are designed to the sets of tweets of increasing size were used and the computational time necessary to run the algorithms was recorded and compared. The results obtained in these experiments showed that building the graph using an all neighbors edging scheme invariably provided superior performance, and assigning weights to the edges based on the weight as the inverse co-occurrence frequency was superior cases. One possible future work is to apply centrality measures TKG showed to be faster for all its variations when compared with TFIDF and KEA, except for the weighting scheme based on the inverse co-occurrence frequency. TKG is a novel and robust proposal to extract keywords from texts, particularly from short messages, such as tweets.

Download Full-text

Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review

Sensors ◽

10.3390/s20051520 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1520 ◽

Cited By ~ 5

Author(s):

Qian Zhang ◽

Yeqi Liu ◽

Chuanyang Gong ◽

Yingyi Chen ◽

Huihui Yu

Keyword(s):

Deep Learning ◽

Language Processing ◽

Deep Neural Networks ◽

State Of The Art ◽

Semantic Segmentation ◽

Scene Analysis ◽

Superior Performance ◽

Learning Technology ◽

Future Work ◽

Modern Image

Deep Learning (DL) is the state-of-the-art machine learning technology, which shows superior performance in computer vision, bioinformatics, natural language processing, and other areas. Especially as a modern image processing technology, DL has been successfully applied in various tasks, such as object detection, semantic segmentation, and scene analysis. However, with the increase of dense scenes in reality, due to severe occlusions, and small size of objects, the analysis of dense scenes becomes particularly challenging. To overcome these problems, DL recently has been increasingly applied to dense scenes and has begun to be used in dense agricultural scenes. The purpose of this review is to explore the applications of DL for dense scenes analysis in agriculture. In order to better elaborate the topic, we first describe the types of dense scenes in agriculture, as well as the challenges. Next, we introduce various popular deep neural networks used in these dense scenes. Then, the applications of these structures in various agricultural tasks are comprehensively introduced in this review, including recognition and classification, detection, counting and yield estimation. Finally, the surveyed DL applications, limitations and the future work for analysis of dense images in agriculture are summarized.

Download Full-text

Standard Statistical and Graph based Automatic Keyword Extraction

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7601.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5013-5019

Keyword(s):

Language Processing ◽

Extraction Process ◽

Extraction Methods ◽

Centrality Measures ◽

Keyword Extraction ◽

Automatic Extraction ◽

Digital Era ◽

Term Extraction ◽

Statistical Approaches ◽

Algorithmic Approaches

Automatic extraction of terms from a document is essential in the current digital era to summarize the documents. For instance, instead of go through the full documents, some of the author's keywords partially explain the discussions of the documents. However, the author's keywords are not sufficient to identify the whole concept of the document. Hence the requirement of automatic term extraction methods is necessary. The major categories of automatic extraction approaches falls mainly on some techniques such as Natural Language Processing, Statistical approaches, Graph Based approaches, Natural Inspired algorithmic approaches, etc. Even though there are numerous approaches available the exact automatic keyword extraction is a major challenge in areas, that reveals around documents. In this paper, a comparative analysis of Keyword extraction between standard Statistical approaches and Graph based approaches has been conducted. In standard statistical approaches, the terms are extracted on the basis of physical counts and in the Graph based approach, the documents are automatically constructed as graphs by applying centrality measures during the keyword extraction process. The results of both approaches were compared and analyzed

Download Full-text

FLAKE: Fuzzy Graph Centrality-based Automatic Keyword Extraction

The Computer Journal ◽

10.1093/comjnl/bxaa133 ◽

2020 ◽

Author(s):

Amita Jain ◽

Kanika Mittal ◽

Kunwar Singh Vaisla

Keyword(s):

Fuzzy Logic ◽

Text Mining ◽

Relative Position ◽

Centrality Measures ◽

Keyword Extraction ◽

Frequency Of Occurrence ◽

Fuzzy Graph ◽

Graph Centrality ◽

Important Nodes

Abstract Keyword extraction is one of the most important aspects of text mining. Keywords help in identifying the document context. Many researchers have contributed their work to keyword extraction. They proposed approaches based on the frequency of occurrence, the position of words or the similarity between two terms. However, these approaches have shown shortcomings. In this paper, we propose a method that tries to overcome some of these shortcomings and present a new algorithm whose efficiency has been evaluated against widely used benchmarks. It is found from the analysis of standard datasets that the position of word in the document plays an important role in the identification of keywords. In this paper, a fuzzy logic-based automatic keyword extraction (FLAKE) method is proposed. FLAKE assigns weights to the keywords by considering the relative position of each word in the entire document as well as in the sentence coupled with the total occurrences of that word in the document. Based on the above data, candidate keywords are selected. Using WordNet, a fuzzy graph is constructed whose nodes represent candidate keywords. At this point, the most important nodes (based on fuzzy graph centrality measures) are identified. Those important nodes are selected as final keywords. The experiments conducted on various datasets show that proposed approach outperforms other keyword extraction methodologies by enhancing precision and recall.

Download Full-text

Does higher education properly prepare graduates for the growing artificial intelligence market? Gaps identification using text mining

Human Systems Management ◽

10.3233/hsm-211179 ◽

2021 ◽

pp. 1-13

Author(s):

Lamiae Benhayoun ◽

Daniel Lang

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Academic Training ◽

Market Requirements ◽

Job Advertisements ◽

The Individual

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.

Download Full-text

Keyword extraction method for machine reading comprehension based on natural language processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1955/1/012072 ◽

2021 ◽

Vol 1955 (1) ◽

pp. 012072

Author(s):

Ruiheng Li ◽

Xuan Zhang ◽

Chengdong Li ◽

Zhongju Zheng ◽

Zihang Zhou ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Extraction Method ◽

Keyword Extraction ◽

Machine Reading

Download Full-text

Optimal Fractional PID Controller for Buck Converter Using Cohort Intelligent Algorithm

Applied System Innovation ◽

10.3390/asi4030050 ◽

2021 ◽

Vol 4 (3) ◽

pp. 50

Author(s):

Preeti Warrier ◽

Pritesh Shah

Keyword(s):

Fractional Order ◽

Pid Controller ◽

Power Converters ◽

Buck Converter ◽

Optimization Techniques ◽

Superior Performance ◽

Computational Time ◽

Optimization Approach ◽

Response Characteristics ◽

Fractional Order Controller

The control of power converters is difficult due to their non-linear nature and, hence, the quest for smart and efficient controllers is continuous and ongoing. Fractional-order controllers have demonstrated superior performance in power electronic systems in recent years. However, it is a challenge to attain optimal parameters of the fractional-order controller for such types of systems. This article describes the optimal design of a fractional order PID (FOPID) controller for a buck converter using the cohort intelligence (CI) optimization approach. The CI is an artificial intelligence-based socio-inspired meta-heuristic algorithm, which has been inspired by the behavior of a group of candidates called a cohort. The FOPID controller parameters are designed for the minimization of various performance indices, with more emphasis on the integral squared error (ISE) performance index. The FOPID controller shows faster transient and dynamic response characteristics in comparison to the conventional PID controller. Comparison of the proposed method with different optimization techniques like the GA, PSO, ABC, and SA shows good results in lesser computational time. Hence the CI method can be effectively used for the optimal tuning of FOPID controllers, as it gives comparable results to other optimization algorithms at a much faster rate. Such controllers can be optimized for multiple objectives and used in the control of various power converters giving rise to more efficient systems catering to the Industry 4.0 standards.

Download Full-text

Identifying Causality and Contributory Factors of Pipeline incidents by Employing Natural Language Processing and Text Mining Techniques

Process Safety and Environmental Protection ◽

10.1016/j.psep.2021.05.036 ◽

2021 ◽

Author(s):

Guanyang Liu ◽

Mason Boyd ◽

Mengxi Yu ◽

S. Zohra Halim ◽

Noor Quddus

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Contributory Factors

Download Full-text

Analysis Methods in Neural Language Processing: A Survey

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00254 ◽

2019 ◽

Vol 7 ◽

pp. 49-72 ◽

Cited By ~ 26

Author(s):

Yonatan Belinkov ◽

James Glass

Keyword(s):

Language Processing ◽

Network Models ◽

Research Trends ◽

Neural Network Models ◽

Fine Grained ◽

Survey Paper ◽

Review Analysis ◽

Analysis Methods ◽

New Models ◽

Future Work

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text