scholarly journals LTmatch: A Method to Abstract Pattern from Unstructured Log

2021 ◽  
Vol 11 (11) ◽  
pp. 5302
Author(s):  
Xiaodong Wang ◽  
Yining Zhao ◽  
Haili Xiao ◽  
Xiaoning Wang ◽  
Xuebin Chi

Logs record valuable data from different software and systems. Execution logs are widely available and are helpful in monitoring, examination, and system understanding of complex applications. However, log files usually contain too many lines of data for a human to deal with, therefore it is important to develop methods to process logs by computers. Logs are usually unstructured, which is not conducive to automatic analysis. How to categorize logs and turn into structured data automatically is of great practical significance. In this paper, LTmatch algorithm is proposed, which implements a log pattern extracting algorithm based on a weighted word matching rate. Compared with our preview work, this algorithm not only classifies the logs according to the longest common subsequence(LCS) but also gets and updates the log template in real-time. Besides, the pattern warehouse of the algorithm uses a fixed deep tree to store the log patterns, which optimizes the matching efficiency of log pattern extraction. To verify the advantages of the algorithm, we applied the proposed algorithm to the open-source data set with different kinds of labeled log data. A variety of state-of-the-art log pattern extraction algorithms are used for comparison. The result shows our method is improved by 2.67% in average accuracy when compared with the best result in all the other methods.

Author(s):  
Misturah Adunni Alaran ◽  
AbdulAkeem Adesina Agboola ◽  
Adio Taofiki Akinwale ◽  
Olusegun Folorunso

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.


2019 ◽  
Vol 9 (17) ◽  
pp. 3558 ◽  
Author(s):  
Jinying Yu ◽  
Yuchen Gao ◽  
Yuxin Wu ◽  
Dian Jiao ◽  
Chang Su ◽  
...  

Non-intrusive load monitoring (NILM) is a core technology for demand response (DR) and energy conservation services. Traditional NILM methods are rarely combined with practical applications, and most studies aim to disaggregate the whole loads in a household, which leads to low identification accuracy. In this method, the event detection method is used to obtain the switching event sets of all loads, and the power consumption curves of independent unknown electrical appliances in a period are disaggregated by utilizing comprehensive features. A linear discriminant classifier group based on multi-feature global similarity is used for load identification. The uniqueness of our algorithm is that it designs an event detector based on steady-state segmentation and a linear discriminant classifier group based on multi-feature global similarity. The simulation is carried out on an open source data set. The results demonstrate the effectiveness and high accuracy of the multi-feature integrated classification (MFIC) algorithm by using the state-of-the-art NILM methods as benchmarks.


2021 ◽  
Vol 40 (1) ◽  
pp. 68-71
Author(s):  
Haibin Di ◽  
Anisha Kaul ◽  
Leigh Truelove ◽  
Weichang Li ◽  
Wenyi Hu ◽  
...  

We present a data challenge as part of the hackathon planned for the August 2021 SEG Research Workshop on Data Analytics and Machine Learning for Exploration and Production. The hackathon aims to provide hands-on machine learning experience for beginners and advanced practitioners, using a relatively well-defined problem and a carefully curated data set. The seismic data are from New Zealand's Taranaki Basin. The labels for a subset of the data have been generated by an experienced geologist. The objective of the challenge is to develop innovative machine learning solutions to identify key horizons.


2019 ◽  
Vol 64 (1) ◽  
pp. 97-117 ◽  
Author(s):  
William A. Donohue ◽  
Qi Hao ◽  
Richard Spreng ◽  
Charles Owen

The purpose of this article is to illustrate innovations in text analysis associated with understanding conflict-related communication events. Two innovations will be explored: LIWC (Linguistic Inquiry and Word Count), the text modeling program from the open-source data analysis software program R, and SPSS Modeler. The LIWC analysis revisits the 2009 study by Donohue and Druckman and the 2014 study by Donohue, Liang, and Druckman focusing on text analyses of the Oslo I Accords between the Palestinians and Israelis to illustrate this approach. The R and SPSS modeling of text analysis use the same data set as the LIWC analysis to provide a different set of pictures associated with each leader’s rhetoric during the period in which the Oslo I accords were being negotiated. Each innovation provides different insights into the mind-set of the two groups of leaders as the secret talks were emerging. The implications of each approach in establishing an understanding of the communication exchanges are discussed to conclude the article.


2021 ◽  
Author(s):  
Elisabetta Vallarino ◽  
Sara Sommariva ◽  
Dario Arnaldi ◽  
Francesco Famà ◽  
Michele Piana ◽  
...  

AbstractA classic approach to estimate the individual theta-to-alpha transition frequency requires two electroencephalographic (EEG) recordings, one acquired in restingstate condition and one showing an alpha de-synchronisation due e.g. to task execution. This translates into longer recording sessions that my be cumbersome in studies involving patients. Moreover, incomplete de-synchronisation of the alpha rhythm may compromise the final estimation of the transition frequency. Here we present transfreq, a Python library that allows the computation of the transition frequency from resting-state data by clustering the spectral profiles at different EEG channels based on their content in the alpha and theta bands. We first provide an overview of the transfreq core algorithm and of the software architecture. Then we demonstrate its feasibility and robustness across different experimental setups on a publicly available EEG data set and on in-house recordings. A detailed documentation of transfreq and the codes for reproducing the analysis of the paper with the open-source data set are available online at https://elisabettavallarino.github.io/transfreq/


Eye ◽  
2020 ◽  
Author(s):  
Christoph Kern ◽  
Dun Jack Fu ◽  
Josef Huemer ◽  
Livia Faes ◽  
Siegfried K. Wagner ◽  
...  

2016 ◽  
Vol 855 ◽  
pp. 153-158
Author(s):  
Kritwara Rattanaopas ◽  
Sureerat Kaewkeerat ◽  
Yanapat Chuchuen

Big Data is widely used in many organizations nowadays. Hive is an open source data warehouse system for managing large data set. It provides a SQL-like interface to Hadoop over Map-Reduce framework. Currently, Big Data solution starts to adopt HiveQL tool to improve execution time of relational information. In this paper, we investigate on an execution time of query processing issues comparing two algorithm of ORC file: ZLIB and SNAPPY. The results show that ZLIB can compress data up to 87% compared to NONE compressing data. It was better than SNAPPY which has space saving 79%. However, the key for reducing execution time is Map-Reduce that were shown by a less query execution time when mapper and data node were equal. For example, all query suites in 6-node(ZLIB/SNAPPY) with 250-million table rows has quite similar execution time comparison to 9-node(ZLIB/SNAPPY) with 350-million table rows.


Sign in / Sign up

Export Citation Format

Share Document