Efficient Online Log Parsing with Log Punctuations Signature

Logs, recording the system runtime information, are frequently used to ensure software system reliability. As the first and foremost step of typical log analysis, many data-driven methods have been proposed for automated log parsing. Most existing log parsers work offline, requiring a time-consuming training progress and retraining as the system upgrades. Meanwhile, the state of the art online log parsers are tree-based, which still have defects in robustness and efficiency. To overcome such limitations, we abandon the tree structure and propose a hash-like method. In this paper, we propose LogPunk, an efficient online log parsing method. The core of LogPunk is a novel log signature method based on log punctuations and length features. According to the signature, we can quickly find a small set of candidate templates. Further, the most suitable template is returned by traversing the candidate set with our log similarity function. We evaluated LogPunk on 16 public datasets from the LogHub comparing with five other log parsers. LogPunk achieves the best parsing accuracy of 91.9%. Evaluation results also demonstrate its superiority in terms of robustness and efficiency.

Download Full-text

Generalization in data-driven models of primary visual cortex

10.1101/2020.10.05.326256 ◽

2020 ◽

Author(s):

Konstantin-Klemens Lurz ◽

Mohammad Bashiri ◽

Konstantin Willeke ◽

Akshay K. Jagadish ◽

Eric Wang ◽

...

Keyword(s):

Visual Cortex ◽

Transfer Learning ◽

Primary Visual Cortex ◽

State Of The Art ◽

Response Prediction ◽

Data Driven ◽

Convolutional Network ◽

The Core ◽

Current Task ◽

Previous State

AbstractDeep neural networks (DNN) have set new standards at predicting responses of neural populations to visual input. Most such DNNs consist of a convolutional network (core) shared across all neurons which learns a representation of neural computation in visual cortex and a neuron-specific readout that linearly combines the relevant features in this representation. The goal of this paper is to test whether such a representation is indeed generally characteristic for visual cortex, i.e. generalizes between animals of a species, and what factors contribute to obtaining such a generalizing core. To push all non-linear computations into the core where the generalizing cortical features should be learned, we devise a novel readout that reduces the number of parameters per neuron in the readout by up to two orders of magnitude compared to the previous state-of-the-art. It does so by taking advantage of retinotopy and learns a Gaussian distribution over the neuron’s receptive field position. With this new readout we train our network on neural responses from mouse primary visual cortex (V1) and obtain a gain in performance of 7% compared to the previous state-of-the-art network. We then investigate whether the convolutional core indeed captures general cortical features by using the core in transfer learning to a different animal. When transferring a core trained on thousands of neurons from various animals and scans we exceed the performance of training directly on that animal by 12%, and outperform a commonly used VGG16 core pre-trained on imagenet by 33%. In addition, transfer learning with our data-driven core is more data-efficient than direct training, achieving the same performance with only 40% of the data. Our model with its novel readout thus sets a new state-of-the-art for neural response prediction in mouse visual cortex from natural images, generalizes between animals, and captures better characteristic cortical features than current task-driven pre-training approaches such as VGG16.

Download Full-text

The core academic and scientific disciplines underlying data-driven smart sustainable urbanism: an interdisciplinary and transdisciplinary framework

Computational Urban Science ◽

10.1007/s43762-021-00001-2 ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Simon Elias Bibri

Keyword(s):

The Body ◽

Data Driven ◽

New Era ◽

The Core ◽

Sustainable Urbanism ◽

Scientific Disciplines ◽

Specific Direction ◽

Research Problems ◽

Urban Planning And Development ◽

Application Potential

AbstractA new era is presently unfolding wherein both smart urbanism and sustainable urbanism processes and practices are becoming highly responsive to a form of data-driven urbanism under what has to be identified as data-driven smart sustainable urbanism. This flourishing field of research is profoundly interdisciplinary and transdisciplinary in nature. It operates out of the understanding that advances in knowledge necessitate pursuing multifaceted questions that can only be resolved from the vantage point of interdisciplinarity and transdisciplinarity. This implies that the research problems within the field of data-driven smart sustainable urbanism are inherently too complex and dynamic to be addressed by single disciplines. As this field is not a specific direction of research, it does not have a unitary disciplinary framework in terms of a uniform set of the academic and scientific disciplines from which the underlying theories can be drawn. These theories constitute a unified foundation for the practice of data-driven smart sustainable urbanism. Therefore, it is of significant importance to develop an interdisciplinary and transdisciplinary framework. With that in regard, this paper identifies, describes, discusses, evaluates, and thematically organizes the core academic and scientific disciplines underlying the field of data-driven smart sustainable urbanism. This work provides an important lens through which to understand the set of established and emerging disciplines that have high integration, fusion, and application potential for informing the processes and practices of data-driven smart sustainable urbanism. As such, it provides fertile insights into the core foundational principles of data-driven smart sustainable urbanism as an applied domain in terms of its scientific, technological, and computational strands. The novelty of the proposed framework lies in its original contribution to the body of foundational knowledge of an emerging field of urban planning and development.

Download Full-text

Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Li-ion Diagnosis and Prognosis

Energies ◽

10.3390/en14092371 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2371

Author(s):

Matthieu Dubarry ◽

David Beck

Keyword(s):

Cobalt Oxide ◽

State Of The Art ◽

Data Driven ◽

Nickel Aluminum ◽

Li Ion Battery ◽

Diagnosis And Prognosis ◽

Detailed Statistical Analysis ◽

Li Ion ◽

Synthetic Datasets ◽

Nickel Manganese

The development of data driven methods for Li-ion battery diagnosis and prognosis is a growing field of research for the battery community. A big limitation is usually the size of the training datasets which are typically not fully representative of the real usage of the cells. Synthetic datasets were proposed to circumvent this issue. This publication provides improved datasets for three major battery chemistries, LiFePO4, Nickel Aluminum Cobalt Oxide, and Nickel Manganese Cobalt Oxide 811. These datasets can be used for statistical or deep learning methods. This work also provides a detailed statistical analysis of the datasets. Accurate diagnosis as well as early prognosis comparable with state of the art, while providing physical interpretability, were demonstrated by using the combined information of three learnable parameters.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Data-Driven Structural Health Monitoring and Damage Detection through Deep Learning: State-of-the-Art Review

Sensors ◽

10.3390/s20102778 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2778 ◽

Cited By ~ 12

Author(s):

Mohsen Azimi ◽

Armin Eslamlou ◽

Gokhan Pekcan

Keyword(s):

Deep Learning ◽

Structural Health Monitoring ◽

Health Monitoring ◽

High Speed ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Driven ◽

Structural Health ◽

Promising Tool ◽

Significant Attention

Data-driven methods in structural health monitoring (SHM) is gaining popularity due to recent technological advancements in sensors, as well as high-speed internet and cloud-based computation. Since the introduction of deep learning (DL) in civil engineering, particularly in SHM, this emerging and promising tool has attracted significant attention among researchers. The main goal of this paper is to review the latest publications in SHM using emerging DL-based methods and provide readers with an overall understanding of various SHM applications. After a brief introduction, an overview of various DL methods (e.g., deep neural networks, transfer learning, etc.) is presented. The procedure and application of vibration-based, vision-based monitoring, along with some of the recent technologies used for SHM, such as sensors, unmanned aerial vehicles (UAVs), etc. are discussed. The review concludes with prospects and potential limitations of DL-based methods in SHM applications.

Download Full-text

Contour clustering: A field-data-driven approach for documenting and analysing prototypical f0 contours

Journal of the International Phonetic Association ◽

10.1017/s0025100321000049 ◽

2021 ◽

pp. 1-30

Author(s):

Constantijn Kaland

Keyword(s):

Cluster Analysis ◽

User Interfaces ◽

Data Driven ◽

Language Documentation ◽

Cluster Number ◽

Automatic Data ◽

Speaker Variability ◽

The Core ◽

Data Driven Approach ◽

Confirmatory Analysis

ABSTRACT This paper reports an automatic data-driven analysis for describing prototypical intonation patterns, particularly suitable for initial stages of prosodic research and language description. The approach has several advantages over traditional ways to investigate intonation, such as the applicability to spontaneous speech, language- and domain-independency, and the potential of revealing meaningful functions of intonation. These features make the approach particularly useful for language documentation, where the description of prosody is often lacking. The core of this approach is a cluster analysis on a time-series of f0 measurements and consists of two scripts (Praat and R, available from https://constantijnkaland.github.io/contourclustering/). Graphical user interfaces can be used to perform the analyses on collected data ranging from spontaneous to highly controlled speech. There is limited need for manual annotation prior to analysis and speaker variability can be accounted for. After cluster analysis, Praat textgrids can be generated with the cluster number annotated for each individual contour. Although further confirmatory analysis is still required, the outcomes provide useful and unbiased directions for any investigation of prototypical f0 contours based on their acoustic form.

Download Full-text

An Episode in the History of PreCrime

Historical Studies in the Natural Sciences ◽

10.1525/hsns.2018.48.5.637 ◽

2018 ◽

Vol 48 (5) ◽

pp. 637-647

Author(s):

Rebecca Lemov

Keyword(s):

Mathematical Models ◽

Crime Prevention ◽

Scientific Study ◽

Behavioral Data ◽

Data Driven ◽

Special Issue ◽

Predictive Policing ◽

The Core ◽

Current Spread ◽

History Of

This article traces the rise of “predictive” attitudes to crime prevention. After a brief summary of the current spread of predictive policing based on person-centered and place-centered mathematical models, an episode in the scientific study of future crime is examined. At UCLA between 1969 and 1973, a well-funded “violence center” occasioned great hopes that the quotient of human “dangerousness”—potential violence against other humans—could be quantified and thereby controlled. At the core of the center, under the direction of interrogation expert and psychiatrist Louis Jolyon West, was a project to gather unprecedented amounts of behavioral data and centrally store it to identify emergent crime. Protesters correctly seized on the violence center as a potential site of racially targeted experimentation in psychosurgery and an example of iatrogenic science. Yet the eventual spectacular failure of the center belies an ultimate success: its data-driven vision itself predicted the Philip K. Dick–style PreCrime policing now emerging. The UCLA violence center thus offers an alternative genealogy to predictive policing. This essay is part of a special issue entitled Histories of Data and the Database edited by Soraya de Chadarevian and Theodore M. Porter.

Download Full-text

High-Fidelity Simulated Players for Interactive Narrative Planning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/540 ◽

2018 ◽

Author(s):

Pengcheng Wang ◽

Jonathan Rowe ◽

Wookhee Min ◽

Bradford Mott ◽

James Lester

Keyword(s):

State Of The Art ◽

Data Driven ◽

High Fidelity ◽

Interactive Narrative ◽

Interaction Data ◽

Convolutional Networks ◽

Novel Approach ◽

Adaptation Policies ◽

Narrative Planning ◽

Prior State

Interactive narrative planning offers significant potential for creating adaptive gameplay experiences. While data-driven techniques have been devised that utilize player interaction data to induce policies for interactive narrative planners, they require enormously large gameplay datasets. A promising approach to addressing this challenge is creating simulated players whose behaviors closely approximate those of human players. In this paper, we propose a novel approach to generating high-fidelity simulated players based on deep recurrent highway networks and deep convolutional networks. Empirical results demonstrate that the proposed models significantly outperform the prior state-of-the-art in generating high-fidelity simulated player models that accurately imitate human players’ narrative interactions. Using the high-fidelity simulated player models, we show the advantage of more exploratory reinforcement learning methods for deriving generalizable narrative adaptation policies.

Download Full-text

Very early identification of a bimodal frictional behavior during the post-seismic phase of the 2015 M<sub><i>w</i></sub>8.3 Illapel, Chile, earthquake

10.5194/se-2021-6 ◽

2021 ◽

Author(s):

Cedric Twardzik ◽

Mathilde Vergnolle ◽

Anthony Sladen ◽

Louisa L. H. Tsang

Keyword(s):

Template Matching ◽

State Of The Art ◽

Data Driven ◽

Spatiotemporal Evolution ◽

Seismic Slip ◽

The North ◽

Chile Earthquake ◽

Gnss Data ◽

Matching Techniques

Abstract. It is well-established that the post-seismic slip results from the combined contribution of seismic slip and aseismic slip. However, the partitioning between these two modes of slip remains unclear due to the difficulty to infer detailed and robust descriptions of how both evolve in space and time. This is particularly true just after a mainshock when both processes are expected to be the strongest. Using state-of-the-art sub-daily processing of GNSS data, along with dense catalogs of aftershocks obtained from template-matching techniques, we unravel the spatiotemporal evolution of post-seismic slip and aftershocks over the first 12 hours following the 2015 Mw8.3 Illapel, Chile, earthquake. We show that the very early post-seismic activity occurs over two regions with distinct behaviors. To the north, post-seismic slip appears to be purely aseismic and precedes the occurrence of late aftershocks. To the south, aftershocks are the primary cause of the post-seismic slip. We suggest that this difference in behavior could be inferred only few hours after the mainshock, and thus could contribute to a more data-driven forecasts of long-term aftershocks.

Download Full-text

Regional regression models of percentile flows for the contiguous US: Expert versus data-driven independent variable selection

10.5194/hess-2016-639 ◽

2016 ◽

Author(s):

Geoffrey Fouad ◽

André Skupin ◽

Christina L. Tague

Keyword(s):

Regression Model ◽

Regression Models ◽

Predictive Performance ◽

Data Driven ◽

Mean Annual Precipitation ◽

Expert Assessment ◽

Independent Variables ◽

Regional Regression ◽

Data Driven Approach ◽

Small Set

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.

Download Full-text