A News Event Clustering Algorithm Based on Semantic Relationship Graph

AbstractTraditional document clustering algorithms consider text-based features such as unique word count, concept count, etc. to cluster documents. Meanwhile, event mining is the extraction of specific events, their related sub-events, and the associated semantic relations from documents. This work discusses an approach to event mining through clustering. The Universal Networking Language (UNL)-based subgraph, a semantic representation of the document, is used as the input for clustering. Our research focuses on exploring the use of three different feature sets for event clustering and comparing the approaches used for specific event mining. In our previous work, the clustering algorithm used UNL-based event semantics to represent event context for clustering. However, this approach resulted in different events with similar semantics being clustered together. Hence, instead of considering only UNL event semantics, we considered assigning additional weights to similarity between event contexts with event-related attributes such as time, place, and persons. Although we get specific events in a single cluster, sub-events related to the specific events are not necessarily in a single cluster. Therefore, to improve our cluster efficiency, connective terms between two sentences and their representation as UNL subgraphs were also considered for similarity determination. By combining UNL semantics, event-specific arguments similarity, and connective term concepts between sentences, we were able to obtain clusters for specific events and their sub-events. We have used 112 000 Tamil documents from the Forum for Information Retrieval Evaluation data corpus and achieved good results. We have also compared our approach with the previous state-of-the-art approach for Router-RCV1 corpus and achieved 30% improvements in precision.

Download Full-text

A Graph-Based Biomedical Literature Clustering Approach Utilizing Term's Global and Local Importance Information

Strategic Advancements in Utilizing Data Mining and Warehousing Technologies ◽

10.4018/978-1-60566-717-1.ch008 ◽

2011 ◽

pp. 133-150

Author(s):

Zhang Xiaodan ◽

Hu Xiaohua ◽

Xia Jiali ◽

Zhou Xiaohua ◽

Achananuparp Palakorn

Keyword(s):

Clustering Algorithm ◽

Document Clustering ◽

Biomedical Literature ◽

Semantic Relationship ◽

Clustering Method ◽

Graph Representations ◽

The Core ◽

Document Cluster ◽

Clustering Approach ◽

Global And Local

In this article, we present a graph-based knowledge representation for biomedical digital library literature clustering. An efficient clustering method is developed to identify the ontology-enriched k-highest density term subgraphs that capture the core semantic relationship information about each document cluster. The distance between each document and the k term graph clusters is calculated. A document is then assigned to the closest term cluster. The extensive experimental results on two PubMed document sets (Disease10 and OHSUMED23) show that our approach is comparable to spherical k-means. The contributions of our approach are the following: (1) we provide two corpus-level graph representations to improve document clustering, a term co-occurrence graph and an abstract-title graph; (2) we develop an efficient and effective document clustering algorithm by identifying k distinguishable class-specific core term subgraphs using terms’ global and local importance information; and (3) the identified term clusters give a meaningful explanation for the document clustering results.

Download Full-text

Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals from Social Media Content

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0567 ◽

2018 ◽

Vol 29 (1) ◽

pp. 753-772 ◽

Cited By ~ 1

Author(s):

Omar A. Bari ◽

Arvin Agah

Keyword(s):

Social Media ◽

Stock Prices ◽

Clustering Algorithm ◽

Event Studies ◽

Time Series Models ◽

Media Content ◽

Trading Decisions ◽

Event Clustering ◽

Sharpe Ratios ◽

News Headlines

Abstract Event studies in finance have focused on traditional news headlines to assess the impact an event has on a traded company. The increased proliferation of news and information produced by social media content has disrupted this trend. Although researchers have begun to identify trading opportunities from social media platforms, such as Twitter, almost all techniques use a general sentiment from large collections of tweets. Though useful, general sentiment does not provide an opportunity to indicate specific events worthy of affecting stock prices. This work presents an event clustering algorithm, utilizing natural language processing techniques to generate newsworthy events from Twitter, which have the potential to influence stock prices in the same manner as traditional news headlines. The event clustering method addresses the effects of pre-news and lagged news, two peculiarities that appear when connecting trading and news, regardless of the medium. Pre-news signifies a finding where stock prices move in advance of a news release. Lagged news refers to follow-up or late-arriving news, adding redundancy in making trading decisions. For events generated by the proposed clustering algorithm, we incorporate event studies and machine learning to produce an actionable system that can guide trading decisions. The recommended prediction algorithms provide investing strategies with profitable risk-adjusted returns. The suggested language models present annualized Sharpe ratios (risk-adjusted returns) in the 5–11 range, while time-series models produce in the 2–3 range (without transaction costs). The distribution of returns confirms the encouraging Sharpe ratios by identifying most outliers as positive gains. Additionally, machine learning metrics of precision, recall, and accuracy are discussed alongside financial metrics in hopes of bridging the gap between academia and industry in the field of computational finance.

Download Full-text

Early detection of emergency events from social media: A new text clustering approach

10.21203/rs.3.rs-322787/v1 ◽

2021 ◽

Author(s):

Lida Huang ◽

Panpan Shi ◽

Haichao Zhu ◽

Tao Chen

Keyword(s):

Social Media ◽

Early Detection ◽

Clustering Algorithm ◽

Integrated Approach ◽

Practical Applications ◽

Emergency Event ◽

Attribute Information ◽

Event Clustering ◽

Clustering Approach ◽

Emergency Events

Abstract Emergency events need early detection, quick response, and accuracy recover. In the era of big data, social media users can be seen as social sensors to monitor real time emergency events. This paper proposed an integrated approach to early detect all the four kinds of emergency events including natural disasters, man-made accidents, public health events and social security events. First, the BERT-Att-BiLSTM model is used to detect emergency related posts from the massive and irrelevant data. Then, the 3W attribute information (What, Where and When) of the emergency event is extracted. With the 3W attribute information, we create an unsupervised dynamical event clustering algorithm based on text-similarity and combine it with the supervised logistical regression model to cluster posts into different events. The experiments on Sina Weibo data demonstrate the superiority of the proposed framework. Case studies on some real emergency events show the proposed framework has good performance and high timeliness. Practical applications of the framework have also been discussed, following by some future directions for improvement.

Download Full-text

Distributed Entropy Energy-Efficient Clustering algorithm for cluster head selection (DEEEC)

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189135 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8139-8147

Author(s):

Ranganathan Arun ◽

Rangaswamy Balamurugan

Keyword(s):

Energy Efficient ◽

Clustering Algorithm ◽

Cluster Head ◽

Residual Energy ◽

Energy Utilization ◽

Sensor Nodes ◽

Second Stage ◽

Energy Efficient Clustering ◽

Two Stages ◽

Ch Selection

In Wireless Sensor Networks (WSN) the energy of Sensor nodes is not certainly sufficient. In order to optimize the endurance of WSN, it is essential to minimize the utilization of energy. Head of group or Cluster Head (CH) is an eminent method to develop the endurance of WSN that aggregates the WSN with higher energy. CH for intra-cluster and inter-cluster communication becomes dependent. For complete, in WSN, the Energy level of CH extends its life of cluster. While evolving cluster algorithms, the complicated job is to identify the energy utilization amount of heterogeneous WSNs. Based on Chaotic Firefly Algorithm CH (CFACH) selection, the formulated work is named “Novel Distributed Entropy Energy-Efficient Clustering Algorithm”, in short, DEEEC for HWSNs. The formulated DEEEC Algorithm, which is a CH, has two main stages. In the first stage, the identification of temporary CHs along with its entropy value is found using the correlative measure of residual and original energy. Along with this, in the clustering algorithm, the rotating epoch and its entropy value must be predicted automatically by its sensor nodes. In the second stage, if any member in the cluster having larger residual energy, shall modify the temporary CHs in the direction of the deciding set. The target of the nodes with large energy has the probability to be CHs which is determined by the above two stages meant for CH selection. The MATLAB is required to simulate the DEEEC Algorithm. The simulated results of the formulated DEEEC Algorithm produce good results with respect to the energy and increased lifetime when it is correlated with the current traditional clustering protocols being used in the Heterogeneous WSNs.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

K-MEANS CLUSTERING ALGORITHM BASED CLASSIFICATION OF SOIL FERTILITY IN NORTH WEST NIGERIA

FUDMA Journal of Sciences ◽

10.33003/fjs-2020-0402-363 ◽

2020 ◽

Vol 4 (2) ◽

pp. 780-787

Author(s):

Ibrahim Hassan Hayatu ◽

Abdullahi Mohammed ◽

Barroon Ahmad Isma’eel ◽

Sahabi Yusuf Ali

Keyword(s):

Soil Fertility ◽

Crop Yield ◽

Clustering Algorithm ◽

Soil Samples ◽

North West ◽

R Programming ◽

Available Information ◽

Northwest Region ◽

The Relationship

Soil fertility determines a plant's development process that guarantees food sufficiency and the security of lives and properties through bumper harvests. The fertility of soil varies according to regions, thereby determining the type of crops to be planted. However, there is no repository or any source of information about the fertility of the soil in any region in Nigeria especially the Northwest of the country. The only available information is soil samples with their attributes which gives little or no information to the average farmer. This has affected crop yield in all the regions, more particularly the Northwest region, thus resulting in lower food production. Therefore, this study is aimed at classifying soil data based on their fertility in the Northwest region of Nigeria using R programming. Data were obtained from the department of soil science from Ahmadu Bello University, Zaria. The data contain 400 soil samples containing 13 attributes. The relationship between soil attributes was observed based on the data. K-means clustering algorithm was employed in analyzing soil fertility clusters. Four clusters were identified with cluster 1 having the highest fertility, followed by 2 and the fertility decreases with an increasing number of clusters. The identification of the most fertile clusters will guide farmers on where best to concentrate on when planting their crops in order to improve productivity and crop yield.

Download Full-text