Limitations of information extraction methods and techniques for heterogeneous unstructured big data

Information Extraction from Multifaceted Unstructured Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1074.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1398-1404

Keyword(s):

Big Data ◽

Information Extraction ◽

Data Analytics ◽

Extraction Methods ◽

Vital Role ◽

Unstructured Data ◽

Digital Data ◽

High Rate ◽

Smart Manufacturing ◽

Future Research

In the era of digital globalization, huge volume and variety of data are being produced at a very high rate. Every day, the world is producing around 2.5 quintillion bytes of data. According to IDC, by 2020, over 40 zettabytes of data will be generated and reproduced. Digital data have become a deluge, overwhelming in every field of information technology (IT), business, science and engineering. These fields are shifting to smart and advanced technologies such as smart manufacturing industries, data-aware medical sciences, and other smart applications. These applications are facilitating the industries in context of data-driven decision making, big data storage, and complex analysis of large data sets. Also, these applications are contributing to generate big data deluge where a variety of data necessitate the industries to use advanced IT approaches. 95% of the digital universe is unstructured data. It is rich data as it contains information that can play a vital role to improve big data analytics. The heterogeneity, complexity, lack of structured information, poor quality and scalability of unstructured data generates difficulties in adapting traditional information extraction techniques. Information extraction can play a vital role in transformation of unstructured data into useful information. A multistep pipeline with data preprocessing steps, extraction methods and representation are utmost requirement to improve the unstructured data analytics. In this regard, this paper presents a short review of information extraction process w.r.t. input data type, extraction methods with their corresponding techniques, and representation of extracted information. The issues with unstructured data and the challenges to information extraction from multifaceted unstructured big data as well as the future research directions have also been discussed

Download Full-text

An analytical study of information extraction from unstructured and multidimensional big data

Journal Of Big Data ◽

10.1186/s40537-019-0254-8 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 7

Author(s):

Kiran Adnan ◽

Rehan Akbar

Keyword(s):

Big Data ◽

Information Extraction ◽

Data Analytics ◽

Data Extraction ◽

Research Work ◽

Big Data Analytics ◽

Unstructured Data ◽

Future Research ◽

Data Types ◽

Future Research Directions

Abstract Process of information extraction (IE) is used to extract useful information from unstructured or semi-structured data. Big data arise new challenges for IE techniques with the rapid growth of multifaceted also called as multidimensional unstructured data. Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems. It is necessary to understand the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data. Numerous studies have been conducted on IE, addressing the challenges and issues for different data types such as text, image, audio and video. Very limited consolidated research work have been conducted to investigate the task-dependent and task-independent limitations of IE covering all data types in a single study. This research work address this limitation and present a systematic literature review of state-of-the-art techniques for a variety of big data, consolidating all data types. Recent challenges of IE are also identified and summarized. Potential solutions are proposed giving future research directions in big data IE. The research is significant in terms of recent trends and challenges related to big data analytics. The outcome of the research and recommendations will help to improve the big data analytics by making it more productive.

Download Full-text

An experimental investigation of the impact of using big data analytics on customers’ performance measurement

Accounting Research Journal ◽

10.1108/arj-04-2020-0080 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Marwa Rabe Mohamed Elkmash ◽

Magdy Gamal Abdel-Kader ◽

Bassant Badr El Din

Keyword(s):

Big Data ◽

Performance Measurement ◽

Data Analytics ◽

Positive Impact ◽

Big Data Analytics ◽

Unstructured Data ◽

Future Research ◽

Web Based ◽

Content Type ◽

The Impact

Purpose This study aims to investigate and explore the impact of big data analytics (BDA) as a mechanism that could develop the ability to measure customers’ performance. To accomplish the research aim, the theoretical discussion was developed through the combination of the diffusion of innovation theory with the technology acceptance model (TAM) that is less developed for the research field of this study. Design/methodology/approach Empirical data was obtained using Web-based quasi-experiments with 104 Egyptian accounting professionals. Further, the Wilcoxon signed-rank test and the chi-square goodness-of-fit test were used to analyze data. Findings The empirical results indicate that measuring customers’ performance based on BDA increase the organizations’ ability to analyze the customers’ unstructured data, decrease the cost of customers’ unstructured data analysis, increase the ability to handle the customers’ problems quickly, minimize the time spent to analyze the customers’ data and obtaining the customers’ performance reports and control managers’ bias when they measure customer satisfaction. The study findings supported the accounting professionals’ acceptance of BDA through the TAM elements: the intention to use (R), perceived usefulness (U) and the perceived ease of use (E). Research limitations/implications This study has several limitations that could be addressed in future research. First, this study focuses on customers’ performance measurement (CPM) only and ignores other performance measurements such as employees’ performance measurement and financial performance measurement. Future research can examine these areas. Second, this study conducts a Web-based experiment with Master of Business Administration students as a study’s participants, researchers could conduct a laboratory experiment and report if there are differences. Third, owing to the novelty of the topic, there was a lack of theoretical evidence in developing the study’s hypotheses. Practical implications This study succeeds to provide the much-needed empirical evidence for BDA positive impact in improving CPM efficiency through the proposed framework (i.e. CPM and BDA framework). Furthermore, this study contributes to the improvement of the performance measurement process, thus, the decision-making process with meaningful and proper insights through the capability of collecting and analyzing the customers’ unstructured data. On a practical level, the company could eventually use this study’s results and the new insights to make better decisions and develop its policies. Originality/value This study holds significance as it provides the much-needed empirical evidence for BDA positive impact in improving CPM efficiency. The study findings will contribute to the enhancement of the performance measurement process through the ability of gathering and analyzing the customers’ unstructured data.

Download Full-text

Semantic Technologies and Big Data Analytics for Cyber Defence

Web Services ◽

10.4018/978-1-5225-7501-6.ch074 ◽

2019 ◽

pp. 1430-1443

Author(s):

Louise Leenen ◽

Thomas Meyer

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Future Research ◽

Data Sets ◽

Semantic Technologies ◽

Data Types ◽

Intelligent Decision Making ◽

Big Data Technologies

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logic-based systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.

Download Full-text

Artificial Intelligence and Big Data Analytics in Support of Cyber Defense

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch076 ◽

2021 ◽

pp. 1738-1753

Author(s):

Louise Leenen ◽

Thomas Meyer

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Future Research ◽

Data Sets ◽

Data Types ◽

Cyber Defense ◽

Intelligent Decision Making ◽

Big Data Technologies

Cybersecurity analysts rely on vast volumes of security event data to predict, identify, characterize, and deal with security threats. These analysts must understand and make sense of these huge datasets in order to discover patterns which lead to intelligent decision making and advance warnings of possible threats, and this ability requires automation. Big data analytics and artificial intelligence can improve cyber defense. Big data analytics methods are applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends, and other useful information. Artificial intelligence provides algorithms that can reason or learn and improve their behavior, and includes semantic technologies. A large number of automated systems are currently based on syntactic rules which are generally not sophisticated enough to deal with the level of complexity in this domain. An overview of artificial intelligence and big data technologies in cyber defense is provided, and important areas for future research are identified and discussed.

Download Full-text

Semantic Technologies and Big Data Analytics for Cyber Defence

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch061 ◽

2018 ◽

pp. 1375-1388

Author(s):

Louise Leenen ◽

Thomas Meyer

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Future Research ◽

Data Sets ◽

Semantic Technologies ◽

Data Types ◽

Intelligent Decision Making ◽

Big Data Technologies

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logic-based systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.

Download Full-text

Semantic Technologies and Big Data Analytics for Cyber Defence

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2016070105 ◽

2016 ◽

Vol 6 (3) ◽

pp. 53-64 ◽

Cited By ~ 2

Author(s):

Louise Leenen ◽

Thomas Meyer

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Large Data ◽

Future Research ◽

Data Sets ◽

Semantic Technologies ◽

Data Types ◽

Big Data Technologies

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logic-based systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.

Download Full-text

Providing Engineering Services With Smart Objects

International Journal of Systems and Service-Oriented Engineering ◽

10.4018/ijssoe.2018100103 ◽

2018 ◽

Vol 8 (4) ◽

pp. 43-68

Author(s):

Stephen H. Kiasler ◽

William H. Money ◽

Stephen J. Cohen

Keyword(s):

Big Data ◽

Evolutionary Process ◽

Service Systems ◽

Unstructured Data ◽

Data Sets ◽

Data Types ◽

Smart Objects ◽

New Class ◽

The World ◽

Smart Data

The world of data has been evolving due to the expansion of operations and the complexity of the data processed by systems. Big Data is no longer numbers and characters but are now unstructured data types collected by a variety of devices. Recent work has postulated that the Big Data evolutionary process is making a conceptual leap to incorporate intelligence. This challenges system engineers with new issues as they envision and create service systems to process and incorporate these new data sets and structures. This article proposes that Big Data has not yet made a complete evolutionary leap, but rather that a new class of data—a higher level of abstraction—is needed to integrate this “intelligence” concept. This article examines previous definitions of Smart Data, offers a new conceptualization for smart objects (SO), examines the smart data concept, and identifies issues and challenges of understanding smart objects as a new data managed software paradigm. It concludes that smart objects incorporate new features and have different properties from passive and inert Big Data.

Download Full-text

Artificial Intelligence and Big Data Analytics in Support of Cyber Defense

Developments in Information Security and Cybernetic Wars - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-8304-2.ch002 ◽

2019 ◽

pp. 42-63 ◽

Cited By ~ 4

Author(s):

Louise Leenen ◽

Thomas Meyer

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Future Research ◽

Data Sets ◽

Data Types ◽

Cyber Defense ◽

Intelligent Decision Making ◽

Big Data Technologies

Cybersecurity analysts rely on vast volumes of security event data to predict, identify, characterize, and deal with security threats. These analysts must understand and make sense of these huge datasets in order to discover patterns which lead to intelligent decision making and advance warnings of possible threats, and this ability requires automation. Big data analytics and artificial intelligence can improve cyber defense. Big data analytics methods are applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends, and other useful information. Artificial intelligence provides algorithms that can reason or learn and improve their behavior, and includes semantic technologies. A large number of automated systems are currently based on syntactic rules which are generally not sophisticated enough to deal with the level of complexity in this domain. An overview of artificial intelligence and big data technologies in cyber defense is provided, and important areas for future research are identified and discussed.

Download Full-text

Big Data Analytics for Search Engine Optimization

Big Data and Cognitive Computing ◽

10.3390/bdcc4020005 ◽

2020 ◽

Vol 4 (2) ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Ioannis C. Drivas ◽

Damianos P. Sakas ◽

Georgios A. Giannakopoulos ◽

Daphne Kyriaki-Manessi

Keyword(s):

Big Data ◽

Cultural Heritage ◽

Search Engine ◽

Data Analytics ◽

User Behavior ◽

Big Data Analytics ◽

Data Types ◽

Search Engine Optimization ◽

The Impact ◽

The Web

In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations’ websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.

Download Full-text