FLAKE: Fuzzy Graph Centrality-based Automatic Keyword Extraction

2020 ◽  
Author(s):  
Amita Jain ◽  
Kanika Mittal ◽  
Kunwar Singh Vaisla

Abstract Keyword extraction is one of the most important aspects of text mining. Keywords help in identifying the document context. Many researchers have contributed their work to keyword extraction. They proposed approaches based on the frequency of occurrence, the position of words or the similarity between two terms. However, these approaches have shown shortcomings. In this paper, we propose a method that tries to overcome some of these shortcomings and present a new algorithm whose efficiency has been evaluated against widely used benchmarks. It is found from the analysis of standard datasets that the position of word in the document plays an important role in the identification of keywords. In this paper, a fuzzy logic-based automatic keyword extraction (FLAKE) method is proposed. FLAKE assigns weights to the keywords by considering the relative position of each word in the entire document as well as in the sentence coupled with the total occurrences of that word in the document. Based on the above data, candidate keywords are selected. Using WordNet, a fuzzy graph is constructed whose nodes represent candidate keywords. At this point, the most important nodes (based on fuzzy graph centrality measures) are identified. Those important nodes are selected as final keywords. The experiments conducted on various datasets show that proposed approach outperforms other keyword extraction methodologies by enhancing precision and recall.

Social media refers to a set of different web sites like Twitter is a microblogging service that generates a huge amount of textual content daily. These methods based on text mining, natural language processing, and information retrieval are usually applied. The text mining approaches, documents are represented using the well-known vector space model, which results in sparse matrices to be dealt with computationally. A technique to extract keywords from collections of Twitter messages based on the representation of texts employing a graph structure, from which it is assigned relevance values to the vertices, based on graph centrality measures. The proposed approach, called TKG, relies on three phases: text pre-processing; graph building and keyword extraction. The first experiment applies TKG to a text from the Time magazine and compares its performance with TFID [1] and KEA[6], having human classifications as benchmarks. Finally, these algorithms are designed to the sets of tweets of increasing size were used and the computational time necessary to run the algorithms was recorded and compared. The results obtained in these experiments showed that building the graph using an all neighbors edging scheme invariably provided superior performance, and assigning weights to the edges based on the weight as the inverse co-occurrence frequency was superior cases. One possible future work is to apply centrality measures TKG showed to be faster for all its variations when compared with TFIDF and KEA, except for the weighting scheme based on the inverse co-occurrence frequency. TKG is a novel and robust proposal to extract keywords from texts, particularly from short messages, such as tweets.


2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Takayasu Fushimi ◽  
Seiya Okubo ◽  
Kazumi Saito

Abstract In this study, we propose novel centrality measures considering multiple perspectives of nodes or node groups based on the facility location problem on a spatial network. The conventional centrality exclusively quantifies the global properties of each node in a network such as closeness and betweenness, and extracts nodes with high scores as important nodes. In the context of facility placement on a network, it is desirable to place facilities at nodes with high accessibility from residents, that is, nodes with a high score in closeness centrality. It is natural to think that such a property of a node changes when the situation changes. For example, in a situation where there are no existing facilities, it is expected that the demand of residents will be satisfied by opening a new facility at the node with the highest accessibility, however, in a situation where there exist some facilities, it is necessary to open a new facility some distance from the existing facilities. Furthermore, it is natural to consider that the concept of closeness differs depending on the relationship with existing facilities, cooperative relationships and competitive relationships. Therefore, we extend a concept of centrality so as to considers the situation where one or more nodes have already been selected belonging to one of some groups. In this study, we propose two measures based on closeness centrality and betweenness centrality as behavior models of people on a spatial network. From our experimental evaluations using actual urban street network data, we confirm that the proposed method, which introduces the viewpoints of each group, shows that there is a difference in the important nodes of each group viewpoint, and that the new store location can be predicted more accurately.


2019 ◽  
Vol 18 (04) ◽  
pp. 1950050
Author(s):  
Anand Gupta ◽  
Manpreet Kaur

Outdegree Centrality (OC) is a graph-based centrality measure that captures local connectedness of a node in a graph. The measure has been used in the literature to highlight key sentences in a graph-based optimisation method for summarisation. It is observed in resultant summaries that OC tends to be biased towards selecting introductory sentences of the document producing only generic summaries. The different graph centrality measures lead to different interpretations of a summary. Therefore, the authors propose to use another suitable centrality measure in order to generate more specific summary rather than a generic summary. Such a summary is expected to be highly informative covering all the subtopics of the source document. This requirement has instigated the authors to use Laplacian Centrality (LC) measure to find the significance of the nodes. The essence of this measure lies in highlighting central nodes from subgraphs which contribute non-uniformly towards the common goal of the graph. The modified method has shown significant improvement in informativeness and coherence of summaries and outperformed state-of-the-art results.


Author(s):  
OURDIA BOUIDGHAGHEN ◽  
MOHAND BOUGHANEM ◽  
HENRI PRADE ◽  
IHAB MALLAK

The paper presents a preliminary investigation of potential methods for extracting semantic views of text contents under the form of structured sets of words, which go beyond standard statistical indexing. The aim is to build kinds of fuzzily weighted structured images of semantic contents. A preliminary step consists in identifying the different types of relations (is-a, part-of, related-to, synonymy, domain, glossary relations) that exist between the words of a text, using some general ontology such as WordNet. Then taking advantage of these relations, different types of fuzzy clusters of words can be built. Moreover, apart from its frequency of occurrence, the importance of a word may be also evaluated through some estimate of its specificity. A degree of "centrality" is also computed for each word in a cluster. The size of the clusters, the frequency, the specificity and the centrality of their words are indications that enable us to build a fuzzy set of sets of words that progressively "emerge" from a text, as being representative of its contents. The ideas advocated in the paper and their potential usefulness are illustrated on a running example and on two experiments. It is expected that obtaining a better representation of the semantic contents of texts may help in particular to give indications of what the text is about to a potential reader.


Author(s):  
ANTONIO MORILLAS ◽  
LUIS ROBLES ◽  
BARBARA DIAZ

In inter-industry studies, the technical coefficients have been analyzed with different methods in order to recognize those coefficients that can be considered to be important for an economy. Many critics have been posed to the procedures, the most remarkable one being their lack of connectivity with the values of the absolute flows behind the coefficients. In our approach, we define the importance of a technical coefficient as a fuzzy concept, and the grade of importance takes into account those absolute flows. This grade can be considered as a membership function, which is used to define a fuzzy graph associated to the I-O matrix. We apply this new procedure to the Spanish 2000 I-O matrix and compare our results to those reached by classical methods.


Sign in / Sign up

Export Citation Format

Share Document