Data-Driven Representations for Testing Independence: Modeling, Analysis and Connection with Mutual Information Estimation

LitRev is a novel robust data driven approach, devel-oped for quick literature review on a particular topic of interest. This method identifies common biological phrases that follow a power law distribution and important phrases which have the normalized point wise mutual information score greater than zero.

Download Full-text

Spatial-temporal distribution of COVID-19 in China and its prediction: A data-driven modeling analysis

The Journal of Infection in Developing Countries ◽

10.3855/jidc.12585 ◽

2020 ◽

Vol 14 (03) ◽

pp. 246-253 ◽

Cited By ~ 32

Author(s):

Rui Huang ◽

Miao Liu ◽

Yongmei Ding

Keyword(s):

Logistic Model ◽

Temporal Distribution ◽

Hubei Province ◽

Central China ◽

Data Driven ◽

Statistical Tool ◽

Spatial Panel ◽

Modeling Analysis ◽

The Difference ◽

Data Driven Modeling

Currently, the outbreak of COVID-19 is rapidly spreading especially in Wuhan city, and threatens 14 million people in central China. In the present study we applied the Moran index, a strong statistical tool, to the spatial panel to show that COVID-19 infection is spatially dependent and mainly spread from Hubei Province in Central China to neighbouring areas. Logistic model was employed according to the trend of available data, which shows the difference between Hubei Province and outside of it. We also calculated the reproduction number R0 for the range of [2.23, 2.51] via SEIR model. The measures to reduce or prevent the virus spread should be implemented, and we expect our data-driven modeling analysis providing some insights to identify and prepare for the future virus control.

Download Full-text

Pipeline for a Data-driven Network of Linguistic Terms

10.3384/ecp184176 ◽

2021 ◽

Author(s):

Søren Wichmann

Keyword(s):

Mutual Information ◽

Data Driven ◽

Linguistic Terms ◽

Text Documents ◽

Page Rank ◽

Pointwise Mutual Information

The present work is aimed at (1) developing a search machine adapted to the large DReaM corpus of linguistic descriptive literature and (2) getting insights into how a data-driven ontology of linguistic terminology might be built. Starting from close to 20,000 text documents from the literature of language descriptions, from documents either born digitally or scanned and OCR’d, we extract keywords and pass them through a pruning pipeline where mainly keywords that can be considered as belonging to linguistic terminology survive. Subsequently we quantify relations among those terms using Normalized Pointwise Mutual Information (NPMI) and use the resulting measures, in conjunction with the Google Page Rank (GPR), to build networks of linguistic terms.

Download Full-text

An Evaluation of COVID-19 in Italy: A data-driven modeling analysis

10.21203/rs.3.rs-28146/v1 ◽

2020 ◽

Author(s):

Yongmei Ding ◽

Liyuan Gao

Keyword(s):

Real Data ◽

Data Driven ◽

The Novel ◽

State Variables ◽

Infected People ◽

Modeling Analysis ◽

Travel Restrictions ◽

Novel Coronavirus ◽

Data Driven Modeling ◽

Global Travel

Abstract The novel coronavirus (COVID-19) that has been spreading worldwide since December 2019 has sickened millions of people, shut down major cities and some countries, prompted unprecedented global travel restrictions. Real data-driven modeling is an effort to help evaluate and curb the spread of the novel virus. Lockdowns and the effectiveness of reduction in the contacts in Italy has been measured via our modified model, with the addition of auxiliary and state variables that represent contacts, contacts with infected, conversion rate, latent propagation. Results show the decrease in infected people due to stay-at-home orders and tracing quarantine intervention. The effect of quarantine and centralized medical treatment was also measured through numerical modeling analysis.

Download Full-text

Mutual information algorithms for optimal attribute selection in data driven partitions of databases

Evolving Systems ◽

10.1007/s12530-018-9237-9 ◽

2018 ◽

Vol 11 (3) ◽

pp. 517-529 ◽

Cited By ~ 1

Author(s):

Ioannis M. Stephanakis ◽

Theodoros Iliou ◽

George Anastassopoulos

Keyword(s):

Mutual Information ◽

Attribute Selection ◽

Data Driven

Download Full-text

Mutual Information Based Feature Selection From Data Driven and Model Based Techniques for Fault Detection in Rolling Element Bearings

Volume 1: 23rd Biennial Conference on Mechanical Vibration and Noise, Parts A and B ◽

10.1115/detc2011-47822 ◽

2011 ◽

Cited By ~ 1

Author(s):

Karthik Kappaganthu ◽

C. Nataraj

Keyword(s):

Fault Detection ◽

Mutual Information ◽

Diagnostic System ◽

Classification Performance ◽

Data Driven ◽

Time Frequency ◽

Bearing Fault ◽

Model Based ◽

Rolling Element ◽

And Performance

This paper proposes a novel technique combining datadriven and model-based techniques to significantly improve the performance in bearing fault diagnostics. Features that provide best classification performance for the given data are selected from a combined set of data driven and model based features. Some of the common data driven techniques from time, frequency and time-frequency domain are considered. For model based feature extraction, recently developed cross-sample entropy is used. The ranking and performance of each of these feature sets are studied, when used independently and when used together. Mutual information based technique is used for ranking and selection of the optimal feature set. Using this method, the contribution to performance and redundancy of each of the data driven features and model based features can be studied. This method can be used to design an effective diagnostic system for bearing fault detection.

Download Full-text

Data-Driven Modeling & Analysis of Dynamic Wake for Wind Farm Control: A Comparison Study

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9327624 ◽

2020 ◽

Author(s):

Zhenyu Chen ◽

Bart M Doekemeijer ◽

Zhongwei Lin ◽

Zhen Xie ◽

Zongming Si ◽

...

Keyword(s):

Wind Farm ◽

Data Driven ◽

Comparison Study ◽

Modeling Analysis ◽

Data Driven Modeling

Download Full-text

A Simple Method for Testing Independence of High-Dimensional Random Vectors

Austrian Journal of Statistics ◽

10.17713/ajs.v37i1.291 ◽

2016 ◽

Vol 37 (1) ◽

Author(s):

Gintautas Jakimauskas ◽

Marijus Radavičius ◽

Jurgis Sušinskas

Keyword(s):

Goodness Of Fit ◽

Classification Problem ◽

Data Driven ◽

Sequential Testing ◽

High Dimensional ◽

Computationally Efficient ◽

Random Vectors ◽

Simple Method ◽

Efficient Procedure ◽

Testing Independence

A simple, data-driven and computationally efficient procedure for testing independence of high-dimensional random vectors is proposed. The procedure is based on interpretation of testing goodness-of-fit as the classification problem, a special sequential partition procedure, elements of sequential testing, resampling and randomization. Monte Carlo simulations are carried out to assess the performance of the procedure.

Download Full-text

On data-driven histogram-based estimation for mutual information

2010 IEEE International Symposium on Information Theory ◽

10.1109/isit.2010.5513635 ◽

2010 ◽

Author(s):

Jorge Silva ◽

Shrikanth S. Narayanan

Keyword(s):

Mutual Information ◽

Data Driven

Download Full-text