LTmatch: A Method to Abstract Pattern from Unstructured Log

Xiaodong Wang; Yining Zhao; Haili Xiao; Xiaoning Wang; Xuebin Chi

doi:10.3390/app11115302

LTmatch: A Method to Abstract Pattern from Unstructured Log

Applied Sciences ◽

10.3390/app11115302 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5302

Author(s):

Xiaodong Wang ◽

Yining Zhao ◽

Haili Xiao ◽

Xiaoning Wang ◽

Xuebin Chi

Keyword(s):

Longest Common Subsequence ◽

Practical Significance ◽

Pattern Extraction ◽

Data Set ◽

Log Files ◽

Open Source Data ◽

Average Accuracy ◽

Source Data ◽

Matching Efficiency ◽

Common Subsequence

Logs record valuable data from different software and systems. Execution logs are widely available and are helpful in monitoring, examination, and system understanding of complex applications. However, log files usually contain too many lines of data for a human to deal with, therefore it is important to develop methods to process logs by computers. Logs are usually unstructured, which is not conducive to automatic analysis. How to categorize logs and turn into structured data automatically is of great practical significance. In this paper, LTmatch algorithm is proposed, which implements a log pattern extracting algorithm based on a weighted word matching rate. Compared with our preview work, this algorithm not only classifies the logs according to the longest common subsequence(LCS) but also gets and updates the log template in real-time. Besides, the pattern warehouse of the algorithm uses a fixed deep tree to store the log patterns, which optimizes the matching efficiency of log pattern extraction. To verify the advantages of the algorithm, we applied the proposed algorithm to the open-source data set with different kinds of labeled log data. A variety of state-of-the-art log pattern extraction algorithms are used for comparison. The result shows our method is improved by 2.67% in average accuracy when compared with the best result in all the other methods.

Download Full-text

A New LCS-Neutrosophic Similarity Measure for Text Information Retrieval

Neutrosophic Sets in Decision Analysis and Operations Research - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-7998-2555-5.ch012 ◽

2020 ◽

pp. 258-280

Author(s):

Misturah Adunni Alaran ◽

AbdulAkeem Adesina Agboola ◽

Adio Taofiki Akinwale ◽

Olusegun Folorunso

Keyword(s):

Information Retrieval ◽

Similarity Measure ◽

Information Search ◽

Longest Common Subsequence ◽

Data Set ◽

String Similarity ◽

True Match ◽

Neutrosophic Logic ◽

Common Subsequence ◽

Text Information

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.

Download Full-text

Non-Intrusive Load Disaggregation by Linear Classifier Group Considering Multi-Feature Integration

Applied Sciences ◽

10.3390/app9173558 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3558 ◽

Cited By ~ 3

Author(s):

Jinying Yu ◽

Yuchen Gao ◽

Yuxin Wu ◽

Dian Jiao ◽

Chang Su ◽

...

Keyword(s):

Identification Accuracy ◽

Data Set ◽

Linear Discriminant ◽

Practical Applications ◽

Core Technology ◽

Open Source Data ◽

Source Data ◽

Load Monitoring ◽

Global Similarity ◽

Linear Discriminant Classifier

Non-intrusive load monitoring (NILM) is a core technology for demand response (DR) and energy conservation services. Traditional NILM methods are rarely combined with practical applications, and most studies aim to disaggregate the whole loads in a household, which leads to low identification accuracy. In this method, the event detection method is used to obtain the switching event sets of all loads, and the power consumption curves of independent unknown electrical appliances in a period are disaggregated by utilizing comprehensive features. A linear discriminant classifier group based on multi-feature global similarity is used for load identification. The uniqueness of our algorithm is that it designs an event detector based on steady-state segmentation and a linear discriminant classifier group based on multi-feature global similarity. The simulation is carried out on an open source data set. The results demonstrate the effectiveness and high accuracy of the multi-feature integrated classification (MFIC) algorithm by using the state-of-the-art NILM methods as benchmarks.

Download Full-text

Workshop Preview: Data Analytics and Machine Learning Hackathon 2021: A deep dive into the open-source data challenge for E&P

The Leading Edge ◽

10.1190/tle40010068.1 ◽

2021 ◽

Vol 40 (1) ◽

pp. 68-71

Author(s):

Haibin Di ◽

Anisha Kaul ◽

Leigh Truelove ◽

Weichang Li ◽

Wenyi Hu ◽

...

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Learning Experience ◽

Data Set ◽

Deep Dive ◽

Open Source Data ◽

Research Workshop ◽

Hands On ◽

Source Data ◽

Exploration And Production

We present a data challenge as part of the hackathon planned for the August 2021 SEG Research Workshop on Data Analytics and Machine Learning for Exploration and Production. The hackathon aims to provide hands-on machine learning experience for beginners and advanced practitioners, using a relatively well-defined problem and a carefully curated data set. The seismic data are from New Zealand's Taranaki Basin. The labels for a subset of the data have been generated by an experienced geologist. The objective of the challenge is to develop innovative machine learning solutions to identify key horizons.

Download Full-text

Understanding the Role of Language in Conflict

American Behavioral Scientist ◽

10.1177/0002764219859626 ◽

2019 ◽

Vol 64 (1) ◽

pp. 97-117 ◽

Cited By ~ 1

Author(s):

William A. Donohue ◽

Qi Hao ◽

Richard Spreng ◽

Charles Owen

Keyword(s):

Text Analysis ◽

Word Count ◽

Data Set ◽

Open Source Data ◽

Communication Events ◽

Source Data ◽

Linguistic Inquiry ◽

Mind Set ◽

The Mind

The purpose of this article is to illustrate innovations in text analysis associated with understanding conflict-related communication events. Two innovations will be explored: LIWC (Linguistic Inquiry and Word Count), the text modeling program from the open-source data analysis software program R, and SPSS Modeler. The LIWC analysis revisits the 2009 study by Donohue and Druckman and the 2014 study by Donohue, Liang, and Druckman focusing on text analyses of the Oslo I Accords between the Palestinians and Israelis to illustrate this approach. The R and SPSS modeling of text analysis use the same data set as the LIWC analysis to provide a different set of pictures associated with each leader’s rhetoric during the period in which the Oslo I accords were being negotiated. Each innovation provides different insights into the mind-set of the two groups of leaders as the secret talks were emerging. The implications of each approach in establishing an understanding of the communication exchanges are discussed to conclude the article.

Download Full-text

The analysis of terrorist attacks in Ukraine based on open-source data set

Наука і техніка Повітряних Сил Збройних Сил України ◽

10.30748/nitps.2017.29.12 ◽

2017 ◽

pp. 91-95

Author(s):

H.V. Pievtsov ◽

A.O. Feklistov

Keyword(s):

Open Source ◽

Terrorist Attacks ◽

Data Set ◽

Open Source Data ◽

Source Data

Download Full-text

Transfreq: a Python package for computing the theta-to-alpha transition frequency from resting state EEG data

10.1101/2021.12.03.471064 ◽

2021 ◽

Author(s):

Elisabetta Vallarino ◽

Sara Sommariva ◽

Dario Arnaldi ◽

Francesco Famà ◽

Michele Piana ◽

...

Keyword(s):

Resting State ◽

Transition Frequency ◽

Data Set ◽

Open Source Data ◽

Eeg Data ◽

Source Data ◽

Classic Approach ◽

Spectral Profiles ◽

Eeg Recordings ◽

The Individual

AbstractA classic approach to estimate the individual theta-to-alpha transition frequency requires two electroencephalographic (EEG) recordings, one acquired in restingstate condition and one showing an alpha de-synchronisation due e.g. to task execution. This translates into longer recording sessions that my be cumbersome in studies involving patients. Moreover, incomplete de-synchronisation of the alpha rhythm may compromise the final estimation of the transition frequency. Here we present transfreq, a Python library that allows the computation of the transition frequency from resting-state data by clustering the spectral profiles at different EEG channels based on their content in the alpha and theta bands. We first provide an overview of the transfreq core algorithm and of the software architecture. Then we demonstrate its feasibility and robustness across different experimental setups on a publicly available EEG data set and on in-house recordings. A detailed documentation of transfreq and the codes for reproducing the analysis of the paper with the open-source data set are available online at https://elisabettavallarino.github.io/transfreq/

Download Full-text

An open-source data set of anti-VEGF therapy in diabetic macular oedema patients over 4 years and their visual acuity outcomes

Eye ◽

10.1038/s41433-020-1048-0 ◽

2020 ◽

Author(s):

Christoph Kern ◽

Dun Jack Fu ◽

Josef Huemer ◽

Livia Faes ◽

Siegfried K. Wagner ◽

...

Keyword(s):

Visual Acuity ◽

Open Source ◽

Macular Oedema ◽

Diabetic Macular Oedema ◽

Data Set ◽

Anti Vegf ◽

Open Source Data ◽

Source Data

Download Full-text

A Comparison of ORC-Compress Performance with Big Data Workload on Virtualization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.855.153 ◽

2016 ◽

Vol 855 ◽

pp. 153-158

Author(s):

Kritwara Rattanaopas ◽

Sureerat Kaewkeerat ◽

Yanapat Chuchuen

Keyword(s):

Big Data ◽

Execution Time ◽

Large Data ◽

Map Reduce ◽

Data Set ◽

Relational Information ◽

Open Source Data ◽

Space Saving ◽

Source Data ◽

Better Than

Big Data is widely used in many organizations nowadays. Hive is an open source data warehouse system for managing large data set. It provides a SQL-like interface to Hadoop over Map-Reduce framework. Currently, Big Data solution starts to adopt HiveQL tool to improve execution time of relational information. In this paper, we investigate on an execution time of query processing issues comparing two algorithm of ORC file: ZLIB and SNAPPY. The results show that ZLIB can compress data up to 87% compared to NONE compressing data. It was better than SNAPPY which has space saving 79%. However, the key for reducing execution time is Map-Reduce that were shown by a less query execution time when mapper and data node were equal. For example, all query suites in 6-node(ZLIB/SNAPPY) with 250-million table rows has quite similar execution time comparison to 9-node(ZLIB/SNAPPY) with 350-million table rows.

Download Full-text