scholarly journals Keyword Extraction from Tweets using Graph-Based Methods

Social media refers to a set of different web sites like Twitter is a microblogging service that generates a huge amount of textual content daily. These methods based on text mining, natural language processing, and information retrieval are usually applied. The text mining approaches, documents are represented using the well-known vector space model, which results in sparse matrices to be dealt with computationally. A technique to extract keywords from collections of Twitter messages based on the representation of texts employing a graph structure, from which it is assigned relevance values to the vertices, based on graph centrality measures. The proposed approach, called TKG, relies on three phases: text pre-processing; graph building and keyword extraction. The first experiment applies TKG to a text from the Time magazine and compares its performance with TFID [1] and KEA[6], having human classifications as benchmarks. Finally, these algorithms are designed to the sets of tweets of increasing size were used and the computational time necessary to run the algorithms was recorded and compared. The results obtained in these experiments showed that building the graph using an all neighbors edging scheme invariably provided superior performance, and assigning weights to the edges based on the weight as the inverse co-occurrence frequency was superior cases. One possible future work is to apply centrality measures TKG showed to be faster for all its variations when compared with TFIDF and KEA, except for the weighting scheme based on the inverse co-occurrence frequency. TKG is a novel and robust proposal to extract keywords from texts, particularly from short messages, such as tweets.

Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1520 ◽  
Author(s):  
Qian Zhang ◽  
Yeqi Liu ◽  
Chuanyang Gong ◽  
Yingyi Chen ◽  
Huihui Yu

Deep Learning (DL) is the state-of-the-art machine learning technology, which shows superior performance in computer vision, bioinformatics, natural language processing, and other areas. Especially as a modern image processing technology, DL has been successfully applied in various tasks, such as object detection, semantic segmentation, and scene analysis. However, with the increase of dense scenes in reality, due to severe occlusions, and small size of objects, the analysis of dense scenes becomes particularly challenging. To overcome these problems, DL recently has been increasingly applied to dense scenes and has begun to be used in dense agricultural scenes. The purpose of this review is to explore the applications of DL for dense scenes analysis in agriculture. In order to better elaborate the topic, we first describe the types of dense scenes in agriculture, as well as the challenges. Next, we introduce various popular deep neural networks used in these dense scenes. Then, the applications of these structures in various agricultural tasks are comprehensively introduced in this review, including recognition and classification, detection, counting and yield estimation. Finally, the surveyed DL applications, limitations and the future work for analysis of dense images in agriculture are summarized.


Automatic extraction of terms from a document is essential in the current digital era to summarize the documents. For instance, instead of go through the full documents, some of the author's keywords partially explain the discussions of the documents. However, the author's keywords are not sufficient to identify the whole concept of the document. Hence the requirement of automatic term extraction methods is necessary. The major categories of automatic extraction approaches falls mainly on some techniques such as Natural Language Processing, Statistical approaches, Graph Based approaches, Natural Inspired algorithmic approaches, etc. Even though there are numerous approaches available the exact automatic keyword extraction is a major challenge in areas, that reveals around documents. In this paper, a comparative analysis of Keyword extraction between standard Statistical approaches and Graph based approaches has been conducted. In standard statistical approaches, the terms are extracted on the basis of physical counts and in the Graph based approach, the documents are automatically constructed as graphs by applying centrality measures during the keyword extraction process. The results of both approaches were compared and analyzed


2020 ◽  
Author(s):  
Amita Jain ◽  
Kanika Mittal ◽  
Kunwar Singh Vaisla

Abstract Keyword extraction is one of the most important aspects of text mining. Keywords help in identifying the document context. Many researchers have contributed their work to keyword extraction. They proposed approaches based on the frequency of occurrence, the position of words or the similarity between two terms. However, these approaches have shown shortcomings. In this paper, we propose a method that tries to overcome some of these shortcomings and present a new algorithm whose efficiency has been evaluated against widely used benchmarks. It is found from the analysis of standard datasets that the position of word in the document plays an important role in the identification of keywords. In this paper, a fuzzy logic-based automatic keyword extraction (FLAKE) method is proposed. FLAKE assigns weights to the keywords by considering the relative position of each word in the entire document as well as in the sentence coupled with the total occurrences of that word in the document. Based on the above data, candidate keywords are selected. Using WordNet, a fuzzy graph is constructed whose nodes represent candidate keywords. At this point, the most important nodes (based on fuzzy graph centrality measures) are identified. Those important nodes are selected as final keywords. The experiments conducted on various datasets show that proposed approach outperforms other keyword extraction methodologies by enhancing precision and recall.


2021 ◽  
pp. 1-13
Author(s):  
Lamiae Benhayoun ◽  
Daniel Lang

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.


2021 ◽  
Vol 1955 (1) ◽  
pp. 012072
Author(s):  
Ruiheng Li ◽  
Xuan Zhang ◽  
Chengdong Li ◽  
Zhongju Zheng ◽  
Zihang Zhou ◽  
...  

2021 ◽  
Vol 4 (3) ◽  
pp. 50
Author(s):  
Preeti Warrier ◽  
Pritesh Shah

The control of power converters is difficult due to their non-linear nature and, hence, the quest for smart and efficient controllers is continuous and ongoing. Fractional-order controllers have demonstrated superior performance in power electronic systems in recent years. However, it is a challenge to attain optimal parameters of the fractional-order controller for such types of systems. This article describes the optimal design of a fractional order PID (FOPID) controller for a buck converter using the cohort intelligence (CI) optimization approach. The CI is an artificial intelligence-based socio-inspired meta-heuristic algorithm, which has been inspired by the behavior of a group of candidates called a cohort. The FOPID controller parameters are designed for the minimization of various performance indices, with more emphasis on the integral squared error (ISE) performance index. The FOPID controller shows faster transient and dynamic response characteristics in comparison to the conventional PID controller. Comparison of the proposed method with different optimization techniques like the GA, PSO, ABC, and SA shows good results in lesser computational time. Hence the CI method can be effectively used for the optimal tuning of FOPID controllers, as it gives comparable results to other optimization algorithms at a much faster rate. Such controllers can be optimized for multiple objectives and used in the control of various power converters giving rise to more efficient systems catering to the Industry 4.0 standards.


Author(s):  
Yonatan Belinkov ◽  
James Glass

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.


2021 ◽  
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.


Sign in / Sign up

Export Citation Format

Share Document