Data stream event prediction based on timing knowledge and state transitions

2020 ◽  
Vol 13 (10) ◽  
pp. 1779-1792
Author(s):  
Yan Li ◽  
Tingjian Ge ◽  
Cindy Chen

We study a practical problem of predicting the upcoming events in data streams using a novel approach. Treating event time orders as relationship types between event entities, we build a dynamic knowledge graph and use it to predict future event timing. A unique aspect of this knowledge graph embedding approach for prediction is that we enhance conventional knowledge graphs with the notion of "states"---in what we call the ephemeral state nodes---to characterize the state of a data stream over time. We devise a complete set of methods for learning relevant events, for building the event-order graph stream from the original data stream, for embedding and prediction, and for theoretically bounding the complexity. We evaluate our approach with four real world stream datasets and find that our method results in high precision and recall values for event timing prediction, ranging between 0.7 and nearly 1, significantly outperforming baseline approaches. Moreover, due to our choice of efficient translation-based embedding, the overall throughput that the stream system can handle, including continuous graph building, training, and event predictions, is over one thousand to sixty thousand tuples per second even on a personal computer---which is especially important in resource constrained environments, including edge computing.

Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 998
Author(s):  
Peng Zhang ◽  
Yi Bu ◽  
Peng Jiang ◽  
Xiaowen Shi ◽  
Bing Lun ◽  
...  

This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.


2021 ◽  
pp. 1-21
Author(s):  
Wenguang Wang ◽  
Yonglin Xu ◽  
Chunhui Du ◽  
Yunwen Chen ◽  
Yijie Wang ◽  
...  

Abstract With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, knowledge graph technology has been in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.


2019 ◽  
Vol 29 (1) ◽  
pp. 1441-1452 ◽  
Author(s):  
G.K. Shailaja ◽  
C.V. Guru Rao

Abstract Privacy-preserving data mining (PPDM) is a novel approach that has emerged in the market to take care of privacy issues. The intention of PPDM is to build up data-mining techniques without raising the risk of mishandling of the data exploited to generate those schemes. The conventional works include numerous techniques, most of which employ some form of transformation on the original data to guarantee privacy preservation. However, these schemes are quite multifaceted and memory intensive, thus leading to restricted exploitation of these methods. Hence, this paper intends to develop a novel PPDM technique, which involves two phases, namely, data sanitization and data restoration. Initially, the association rules are extracted from the database before proceeding with the two phases. In both the sanitization and restoration processes, key extraction plays a major role, which is selected optimally using Opposition Intensity-based Cuckoo Search Algorithm, which is the modified format of Cuckoo Search Algorithm. Here, four research issues, such as hiding failure rate, information preservation rate, and false rule generation, and degree of modification are minimized using the adopted sanitization and restoration processes.


2020 ◽  
Vol 66 (259) ◽  
pp. 790-806
Author(s):  
Chris G. Carr ◽  
Joshua D. Carmichael ◽  
Erin C. Pettit ◽  
Martin Truffer

AbstractGlacial environments exhibit temporally variable microseismicity. To investigate how microseismicity influences event detection, we implement two noise-adaptive digital power detectors to process seismic data from Taylor Glacier, Antarctica. We add scaled icequake waveforms to the original data stream, run detectors on the hybrid data stream to estimate reliable detection magnitudes and compare analytical magnitudes predicted from an ice crack source model. We find that detection capability is influenced by environmental microseismicity for seismic events with source size comparable to thermal penetration depths. When event counts and minimum detectable event sizes change in the same direction (i.e. increase in event counts and minimum detectable event size), we interpret measured seismicity changes as ‘true’ seismicity changes rather than as changes in detection. Generally, one detector (two degree of freedom (2dof)) outperforms the other: it identifies more events, a more prominent summertime diurnal signal and maintains a higher detection capability. We conclude that real physical processes are responsible for the summertime diurnal inter-detector difference. One detector (3dof) identifies this process as environmental microseismicity; the other detector (2dof) identifies it as elevated waveform activity. Our analysis provides an example for minimizing detection biases and estimating source sizes when interpreting temporal seismicity patterns to better infer glacial seismogenic processes.


2020 ◽  
Vol 2020 ◽  
pp. 1-16 ◽  
Author(s):  
Hua Chen ◽  
Chen Xiong ◽  
Jia-meng Xie ◽  
Ming Cai

With the rapid development of data acquisition technology, data acquisition departments can collect increasingly more data. Various data from government agencies are gradually becoming available to the public, including license plate recognition (VLPR) data. As a result, privacy protection is becoming increasingly significant. In this paper, an adversary model based on passing time, color, type, and brand of VLPR data is proposed. Through experimental analysis, the tracking probability of a vehicle’s trajectory can be more than 94% if utilizing the original data. To decrease the tracking probability, a novel approach called the (m, n)-bucket model based on time series is proposed since previous works, such as those using generalization and bucketization models, cannot deal with data with multiple sensitive attributes (SAs) or data with time correlations. Meanwhile, a mathematical model is established to expound the privacy protection principle of the (m, n)-bucket model. By comparing the average calculated linking probability of all individuals and the actual linking probability, it is shown that the mathematical model that is proposed can well expound the privacy protection principle of the (m, n)-bucket model. Extensive experiments confirm that our technique can effectively prevent trajectory privacy disclosures.


2017 ◽  
Vol 24 (6) ◽  
pp. 674-685 ◽  
Author(s):  
Jorge M. Fernandes ◽  
Cristina Leston-Bandeira ◽  
Carsten Schwemmer

Do elected representatives have a time-constant representation focus or do they adapt their focus depending on election proximity? In this article, we examine these overlooked theoretical and empirical puzzles by looking at how reelection-seeking actors adapt their legislative behavior according to the electoral cycle. In parliamentary democracies, representatives need to serve two competing principals: their party and their district. Our analysis hinges on how representatives make a strategic use of parliamentary written questions in a highly party-constrained institutional context to heighten their reselection and reelection prospects. Using an original data set of over 32,000 parliamentary questions tabled by Portuguese representatives from 2005 to 2015, we examine how time interacts with two key explanatory elements: electoral vulnerability and party size. Results show that representation focus is not static over time and, in addition, that electoral vulnerability and party size shape strategic use of parliamentary questions.


2012 ◽  
Vol 220-223 ◽  
pp. 452-458
Author(s):  
Xian Xin Shi ◽  
Zhong Xiang Zhao ◽  
Chang Jian Zhu ◽  
Xiao Xiao Kong ◽  
Jun Fei Chai ◽  
...  

A cluster kernel semi-supervised support vector machine (CKS3VM) based on spectral cluster algorithm is proposed and applied in winch fault classification in this paper. The spectral clustering method is used to re-represent original data samples in an eigenvector space so as to make the data samples in the same cluster gather together much better. Then, a cluster kernel function is constructed upon the eigenvector space. Finally, a cluster kernel S3VM is designed which can satisfy the cluster assumption of semi-supervised study. The experiments on winch fault classification show that the novel approach has high classification accuracy.


2013 ◽  
Vol 427-429 ◽  
pp. 2687-2690
Author(s):  
Yu Ting Zhang ◽  
Gui Fa Teng

Social networks provide customs with a platform for interaction and information sharing. In real social activities, whether individuals or businesses, have to rely on some relations to live, work or engage in commercial activities. The formation of relationship between different actor clusters based on the same actor in real social networks is described in the paper. The relationships and their types as well as relationship attributes in real complex social networks are analyzed in details. An index calledQinshuduto represent the degree of closeness between two actors in real complex social networks is proposed and its computation model based on relationship types and attributes etc. are given. Case study shows thatQinshuduis a very reasonable and effective way for the use of complex social networks.


Sign in / Sign up

Export Citation Format

Share Document