Framework to extract context vectors from unstructured data using big data analytics

Author(s):  
Tanvir Ahmad ◽  
Rafeeq Ahmad ◽  
Sarah Masud ◽  
Farheen Nilofer
2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Ashwin Belle ◽  
Raghuram Thiagarajan ◽  
S. M. Reza Soroushmehr ◽  
Fatemeh Navidi ◽  
Daniel A. Beard ◽  
...  

The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Marwa Rabe Mohamed Elkmash ◽  
Magdy Gamal Abdel-Kader ◽  
Bassant Badr El Din

Purpose This study aims to investigate and explore the impact of big data analytics (BDA) as a mechanism that could develop the ability to measure customers’ performance. To accomplish the research aim, the theoretical discussion was developed through the combination of the diffusion of innovation theory with the technology acceptance model (TAM) that is less developed for the research field of this study. Design/methodology/approach Empirical data was obtained using Web-based quasi-experiments with 104 Egyptian accounting professionals. Further, the Wilcoxon signed-rank test and the chi-square goodness-of-fit test were used to analyze data. Findings The empirical results indicate that measuring customers’ performance based on BDA increase the organizations’ ability to analyze the customers’ unstructured data, decrease the cost of customers’ unstructured data analysis, increase the ability to handle the customers’ problems quickly, minimize the time spent to analyze the customers’ data and obtaining the customers’ performance reports and control managers’ bias when they measure customer satisfaction. The study findings supported the accounting professionals’ acceptance of BDA through the TAM elements: the intention to use (R), perceived usefulness (U) and the perceived ease of use (E). Research limitations/implications This study has several limitations that could be addressed in future research. First, this study focuses on customers’ performance measurement (CPM) only and ignores other performance measurements such as employees’ performance measurement and financial performance measurement. Future research can examine these areas. Second, this study conducts a Web-based experiment with Master of Business Administration students as a study’s participants, researchers could conduct a laboratory experiment and report if there are differences. Third, owing to the novelty of the topic, there was a lack of theoretical evidence in developing the study’s hypotheses. Practical implications This study succeeds to provide the much-needed empirical evidence for BDA positive impact in improving CPM efficiency through the proposed framework (i.e. CPM and BDA framework). Furthermore, this study contributes to the improvement of the performance measurement process, thus, the decision-making process with meaningful and proper insights through the capability of collecting and analyzing the customers’ unstructured data. On a practical level, the company could eventually use this study’s results and the new insights to make better decisions and develop its policies. Originality/value This study holds significance as it provides the much-needed empirical evidence for BDA positive impact in improving CPM efficiency. The study findings will contribute to the enhancement of the performance measurement process through the ability of gathering and analyzing the customers’ unstructured data.


2021 ◽  
pp. 67-74
Author(s):  
Liudmyla Zubyk ◽  
Yaroslav Zubyk

Big data is one of modern tools that have impacted the world industry a lot of. It also plays an important role in determining the ways in which businesses and organizations formulate their strategies and policies. However, very limited academic researches has been conducted into forecasting based on big data due to the difficulties in capturing, collecting, handling, and modeling of unstructured data, which is normally characterized by it’s confidential. We define big data in the context of ecosystem for future forecasting in business decision-making. It can be difficult for a single organization to possess all of the necessary capabilities to derive strategic business value from their findings. That’s why different organizations will build, and operate their own analytics ecosystems or tap into existing ones. An analytics ecosystem comprising a symbiosis of data, applications, platforms, talent, partnerships, and third-party service providers lets organizations be more agile and adapt to changing demands. Organizations participating in analytics ecosystems can examine, learn from, and influence not only their own business processes, but those of their partners. Architectures of popular platforms for forecasting based on big data are presented in this issue.


2017 ◽  
Vol 2 (6) ◽  
pp. 570
Author(s):  
Cungki Kusdarjito

The advancement of big data analytics is paving the way for knowledge creation based on very huge and unstructured data. Currently, information is scattered and growth tremendously, containing many information but difficult to be interpreted. Consequently, traditional approaches are no longer suitable for unstructured data but very rich in information. This situation is different from the role of previous information technology in which information is based on structured data, stored in the local storage, and in more advanced form, information can be retrieved through internet. Meanwhile, in Indonesia data are collected by many institutions with different measurement standard. The nature of the data collection is top-down, carried out by survey which is expensive yet unreliable and stored exclusively by respective institution. SIDeKa (Sistem Informasi Desa dan Kawasan/Village and Regional Information System), which are connected nationally, is proposed as a system of data collection in the village level and prepared by local people. Using SIDeKa, data reliability and readiness can be improved at the local level. The goals of the SIDeKa is not only local people have information in their hand such as poverty level, production, commodity price, the area of cultivated land, and the outbreak of diseases in their village, but also they have information from the neighboring villages or event at the national level. For government, data reliability will improve the policy effectiveness. This paper discusses the implementation and role of SIDeKa for knowledge creation in the village level, especially for the agricultural activities which has been initiated in 2015.Keywords: big data analytics; SIDeKa;  unstructured data.


2021 ◽  
Vol 14 (8) ◽  
pp. 1262-1275
Author(s):  
Jia Zou ◽  
Amitabh Das ◽  
Pratik Barhate ◽  
Arun Iyengar ◽  
Binhang Yuan ◽  
...  

Partitioning is effective in avoiding expensive shuffling operations. However, it remains a significant challenge to automate this process for Big Data analytics workloads that extensively use user defined functions (UDFs), where sub-computations are hard to be reused for partitionings compared to relational applications. In addition, functional dependency that is widely utilized for partitioning selection is often unavailable in the unstructured data that is ubiquitous in UDF-centric analytics. We propose the Lachesis system, which represents UDF-centric workloads as workflows of analyzable and reusable sub-computations. Lachesis further adopts a deep reinforcement learning model to infer which sub-computations should be used to partition the underlying data. This analysis is then applied to automatically optimize the storage of the data across applications to improve the performance and users' productivity.


Author(s):  
Andreas Schmidt ◽  
Martin Atzmueller ◽  
Martin Hollender

This chapter provides an overview of methods for preprocessing structured and unstructured data in the scope of Big Data. Specifically, this chapter summarizes according methods in the context of a real-world dataset in a petro-chemical production setting. The chapter describes state-of-the-art methods for data preparation for Big Data Analytics. Furthermore, the chapter discusses experiences and first insights in a specific project setting with respect to a real-world case study. Furthermore, interesting directions for future research are outlined.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Kiran Adnan ◽  
Rehan Akbar

Abstract Process of information extraction (IE) is used to extract useful information from unstructured or semi-structured data. Big data arise new challenges for IE techniques with the rapid growth of multifaceted also called as multidimensional unstructured data. Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems. It is necessary to understand the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data. Numerous studies have been conducted on IE, addressing the challenges and issues for different data types such as text, image, audio and video. Very limited consolidated research work have been conducted to investigate the task-dependent and task-independent limitations of IE covering all data types in a single study. This research work address this limitation and present a systematic literature review of state-of-the-art techniques for a variety of big data, consolidating all data types. Recent challenges of IE are also identified and summarized. Potential solutions are proposed giving future research directions in big data IE. The research is significant in terms of recent trends and challenges related to big data analytics. The outcome of the research and recommendations will help to improve the big data analytics by making it more productive.


Author(s):  
Sri Venkat Gunturi Subrahmanya ◽  
Dasharathraj K. Shetty ◽  
Vathsala Patil ◽  
B. M. Zeeshan Hameed ◽  
Rahul Paul ◽  
...  

AbstractData science is an interdisciplinary field that extracts knowledge and insights from many structural and unstructured data, using scientific methods, data mining techniques, machine-learning algorithms, and big data. The healthcare industry generates large datasets of useful information on patient demography, treatment plans, results of medical examinations, insurance, etc. The data collected from the Internet of Things (IoT) devices attract the attention of data scientists. Data science provides aid to process, manage, analyze, and assimilate the large quantities of fragmented, structured, and unstructured data created by healthcare systems. This data requires effective management and analysis to acquire factual results. The process of data cleansing, data mining, data preparation, and data analysis used in healthcare applications is reviewed and discussed in the article. The article provides an insight into the status and prospects of big data analytics in healthcare, highlights the advantages, describes the frameworks and techniques used, briefs about the challenges faced currently, and discusses viable solutions. Data science and big data analytics can provide practical insights and aid in the decision-making of strategic decisions concerning the health system. It helps build a comprehensive view of patients, consumers, and clinicians. Data-driven decision-making opens up new possibilities to boost healthcare quality.


Sign in / Sign up

Export Citation Format

Share Document