Data stream event prediction based on timing knowledge and state transitions

We study a practical problem of predicting the upcoming events in data streams using a novel approach. Treating event time orders as relationship types between event entities, we build a dynamic knowledge graph and use it to predict future event timing. A unique aspect of this knowledge graph embedding approach for prediction is that we enhance conventional knowledge graphs with the notion of "states"---in what we call the ephemeral state nodes---to characterize the state of a data stream over time. We devise a complete set of methods for learning relevant events, for building the event-order graph stream from the original data stream, for embedding and prediction, and for theoretically bounding the complexity. We evaluate our approach with four real world stream datasets and find that our method results in high precision and recall values for event timing prediction, ranging between 0.7 and nearly 1, significantly outperforming baseline approaches. Moreover, due to our choice of efficient translation-based embedding, the overall throughput that the stream system can handle, including continuous graph building, training, and event predictions, is over one thousand to sixty thousand tuples per second even on a personal computer---which is especially important in resource constrained environments, including edge computing.

Download Full-text

Toward a Coronavirus Knowledge Graph

Genes ◽

10.3390/genes12070998 ◽

2021 ◽

Vol 12 (7) ◽

pp. 998

Author(s):

Peng Zhang ◽

Yi Bu ◽

Peng Jiang ◽

Xiaowen Shi ◽

Bing Lun ◽

...

Keyword(s):

Drug Discovery ◽

Angiotensin Converting Enzyme ◽

Information Sources ◽

Knowledge Graph ◽

Converting Enzyme ◽

Related Information ◽

Relationship Types ◽

Entity Disambiguation ◽

Public Datasets ◽

Biological Entities

This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.

Download Full-text

A Novel Approach for Finding Frequent Itemsets in Data Stream

International Journal of Intelligent Systems ◽

10.1002/int.21566 ◽

2013 ◽

Vol 28 (3) ◽

pp. 217-241 ◽

Cited By ~ 1

Author(s):

B. Chandra ◽

Shalini Bhaskar

Keyword(s):

Data Stream ◽

Frequent Itemsets ◽

Novel Approach

Download Full-text

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph

Data Intelligence ◽

10.1162/dint_a_00108 ◽

2021 ◽

pp. 1-21

Author(s):

Wenguang Wang ◽

Yonglin Xu ◽

Chunhui Du ◽

Yunwen Chen ◽

Yijie Wang ◽

...

Keyword(s):

Financial Knowledge ◽

Quality Data ◽

Research Report ◽

Knowledge Graph ◽

Entity Extraction ◽

Financial Industry ◽

Data Set ◽

Knowledge Reasoning ◽

Relationship Extraction ◽

Relationship Types

Abstract With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, knowledge graph technology has been in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.

Download Full-text

Opposition Intensity-Based Cuckoo Search Algorithm for Data Privacy Preservation

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0420 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1441-1452 ◽

Cited By ~ 2

Author(s):

G.K. Shailaja ◽

C.V. Guru Rao

Keyword(s):

Data Mining ◽

Data Privacy ◽

Privacy Preservation ◽

Search Algorithm ◽

Cuckoo Search ◽

Original Data ◽

Cuckoo Search Algorithm ◽

Research Issues ◽

Novel Approach ◽

Two Phases

Abstract Privacy-preserving data mining (PPDM) is a novel approach that has emerged in the market to take care of privacy issues. The intention of PPDM is to build up data-mining techniques without raising the risk of mishandling of the data exploited to generate those schemes. The conventional works include numerous techniques, most of which employ some form of transformation on the original data to guarantee privacy preservation. However, these schemes are quite multifaceted and memory intensive, thus leading to restricted exploitation of these methods. Hence, this paper intends to develop a novel PPDM technique, which involves two phases, namely, data sanitization and data restoration. Initially, the association rules are extracted from the database before proceeding with the two phases. In both the sanitization and restoration processes, key extraction plays a major role, which is selected optimally using Opposition Intensity-based Cuckoo Search Algorithm, which is the modified format of Cuckoo Search Algorithm. Here, four research issues, such as hiding failure rate, information preservation rate, and false rule generation, and degree of modification are minimized using the adopted sanitization and restoration processes.

Download Full-text

The influence of environmental microseismicity on detection and interpretation of small-magnitude events in a polar glacier setting

Journal of Glaciology ◽

10.1017/jog.2020.48 ◽

2020 ◽

Vol 66 (259) ◽

pp. 790-806

Author(s):

Chris G. Carr ◽

Joshua D. Carmichael ◽

Erin C. Pettit ◽

Martin Truffer

Keyword(s):

Data Stream ◽

Original Data ◽

Source Model ◽

The Other ◽

Detection Capability ◽

Small Magnitude ◽

Glacial Environments ◽

Hybrid Data ◽

Taylor Glacier ◽

Two Degree Of Freedom

AbstractGlacial environments exhibit temporally variable microseismicity. To investigate how microseismicity influences event detection, we implement two noise-adaptive digital power detectors to process seismic data from Taylor Glacier, Antarctica. We add scaled icequake waveforms to the original data stream, run detectors on the hybrid data stream to estimate reliable detection magnitudes and compare analytical magnitudes predicted from an ice crack source model. We find that detection capability is influenced by environmental microseismicity for seismic events with source size comparable to thermal penetration depths. When event counts and minimum detectable event sizes change in the same direction (i.e. increase in event counts and minimum detectable event size), we interpret measured seismicity changes as ‘true’ seismicity changes rather than as changes in detection. Generally, one detector (two degree of freedom (2dof)) outperforms the other: it identifies more events, a more prominent summertime diurnal signal and maintains a higher detection capability. We conclude that real physical processes are responsible for the summertime diurnal inter-detector difference. One detector (3dof) identifies this process as environmental microseismicity; the other detector (2dof) identifies it as elevated waveform activity. Our analysis provides an example for minimizing detection biases and estimating source sizes when interpreting temporal seismicity patterns to better infer glacial seismogenic processes.

Download Full-text

Privacy Protection Method for Vehicle Trajectory Based on VLPR Data

Journal of Advanced Transportation ◽

10.1155/2020/6026140 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Hua Chen ◽

Chen Xiong ◽

Jia-meng Xie ◽

Ming Cai

Keyword(s):

Mathematical Model ◽

Data Acquisition ◽

Privacy Protection ◽

Rapid Development ◽

Original Data ◽

License Plate ◽

Model Based ◽

Protection Method ◽

Novel Approach ◽

Vehicle Trajectory

With the rapid development of data acquisition technology, data acquisition departments can collect increasingly more data. Various data from government agencies are gradually becoming available to the public, including license plate recognition (VLPR) data. As a result, privacy protection is becoming increasingly significant. In this paper, an adversary model based on passing time, color, type, and brand of VLPR data is proposed. Through experimental analysis, the tracking probability of a vehicle’s trajectory can be more than 94% if utilizing the original data. To decrease the tracking probability, a novel approach called the (m, n)-bucket model based on time series is proposed since previous works, such as those using generalization and bucketization models, cannot deal with data with multiple sensitive attributes (SAs) or data with time correlations. Meanwhile, a mathematical model is established to expound the privacy protection principle of the (m, n)-bucket model. By comparing the average calculated linking probability of all individuals and the actual linking probability, it is shown that the mathematical model that is proposed can well expound the privacy protection principle of the (m, n)-bucket model. Extensive experiments confirm that our technique can effectively prevent trajectory privacy disclosures.

Download Full-text

Election proximity and representation focus in party-constrained environments

Party Politics ◽

10.1177/1354068817689955 ◽

2017 ◽

Vol 24 (6) ◽

pp. 674-685 ◽

Cited By ~ 10

Author(s):

Jorge M. Fernandes ◽

Cristina Leston-Bandeira ◽

Carsten Schwemmer

Keyword(s):

Institutional Context ◽

Original Data ◽

Legislative Behavior ◽

Data Set ◽

Party Size ◽

Parliamentary Democracies ◽

Constrained Environments ◽

Written Questions ◽

Elected Representatives ◽

Over Time

Do elected representatives have a time-constant representation focus or do they adapt their focus depending on election proximity? In this article, we examine these overlooked theoretical and empirical puzzles by looking at how reelection-seeking actors adapt their legislative behavior according to the electoral cycle. In parliamentary democracies, representatives need to serve two competing principals: their party and their district. Our analysis hinges on how representatives make a strategic use of parliamentary written questions in a highly party-constrained institutional context to heighten their reselection and reelection prospects. Using an original data set of over 32,000 parliamentary questions tabled by Portuguese representatives from 2005 to 2015, we examine how time interacts with two key explanatory elements: electoral vulnerability and party size. Results show that representation focus is not static over time and, in addition, that electoral vulnerability and party size shape strategic use of parliamentary questions.

Download Full-text

A Winch Fault Classification Algorithm Based on Cluster Kernel Semi-Supervised Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.220-223.452 ◽

2012 ◽

Vol 220-223 ◽

pp. 452-458

Author(s):

Xian Xin Shi ◽

Zhong Xiang Zhao ◽

Chang Jian Zhu ◽

Xiao Xiao Kong ◽

Jun Fei Chai ◽

...

Keyword(s):

Support Vector Machine ◽

Spectral Clustering ◽

Cluster Algorithm ◽

Original Data ◽

Fault Classification ◽

Support Vector ◽

The Novel ◽

High Classification Accuracy ◽

Novel Approach ◽

Cluster Assumption

A cluster kernel semi-supervised support vector machine (CKS3VM) based on spectral cluster algorithm is proposed and applied in winch fault classification in this paper. The spectral clustering method is used to re-represent original data samples in an eigenvector space so as to make the data samples in the same cluster gather together much better. Then, a cluster kernel function is constructed upon the eigenvector space. Finally, a cluster kernel S3VM is designed which can satisfy the cluster assumption of semi-supervised study. The experiments on winch fault classification show that the novel approach has high classification accuracy.

Download Full-text

A novel approach for data stream clustering using artificial bee colony algorithm

International Journal of Wireless and Mobile Computing ◽

10.1504/ijwmc.2015.066755 ◽

2015 ◽

Vol 8 (1) ◽

pp. 59 ◽

Cited By ~ 2

Author(s):

Chong Huan Xu

Keyword(s):

Data Stream ◽

Artificial Bee Colony Algorithm ◽

Artificial Bee Colony ◽

Stream Clustering ◽

Bee Colony ◽

Novel Approach ◽

Data Stream Clustering

Download Full-text

Formation and Computation of Relationship in Complex Social Networks

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2687 ◽

2013 ◽

Vol 427-429 ◽

pp. 2687-2690

Author(s):

Yu Ting Zhang ◽

Gui Fa Teng

Keyword(s):

Social Networks ◽

Information Sharing ◽

Social Activities ◽

Computation Model ◽

Model Based ◽

Commercial Activities ◽

Relationship Types ◽

As Relationship ◽

Real Complex

Social networks provide customs with a platform for interaction and information sharing. In real social activities, whether individuals or businesses, have to rely on some relations to live, work or engage in commercial activities. The formation of relationship between different actor clusters based on the same actor in real social networks is described in the paper. The relationships and their types as well as relationship attributes in real complex social networks are analyzed in details. An index calledQinshuduto represent the degree of closeness between two actors in real complex social networks is proposed and its computation model based on relationship types and attributes etc. are given. Case study shows thatQinshuduis a very reasonable and effective way for the use of complex social networks.

Download Full-text