scholarly journals Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

2021 ◽  
Author(s):  
Fatimah Alsayoud

Big data ecosystems contain a mix of sophisticated hardware storage components to support heterogeneous workloads. Storage components and the workloads interact and affect each other; therefore, their relationship has to consider when modeling workloads or managing storage. Efficient workload modeling guides optimal storage management decisions, and the right decisions help guarantee the workload’s needs. The first part of this thesis focuses on workload modeling efficiency, and the second part focuses on cost-effective storage management.<div>Workload performance modeling is an essential step in management decisions. The standard modeling approach constructs the model based on a historical dataset collected from one set of setups (scenario). The standard modeling approach requires the model to be reconstructed from scratch with every time the setups changes. To address this issue, we propose a cross-scenario modeling approach that improves the workload’s performance classification accuracy by up to 78% through adopting the Transfer Learning (TL).<br></div><div>The storage system is the most crucial component of the big data ecosystem, where the workload’s execution process starts by fetching data from it and ends by storing data into it. Thus, the workload’s performance is directly affected by storage capability. To provide a high I/O performance in the ecosystems, Solid State Drive (SSD) are utilized as a tier or as a cache on big data distributed ecosystems. SSDs have a short lifespan that is affected by data size and the number of writing operations. Balancing performance requirements and SSD’s lifespan consumption is never easy, and it’s even harder when interacting with a huge amount of data and with heterogeneous I/O patterns. In this thesis, we analysis big data workloads I/O pattern impacts on SSD’s lifespan when SSD is used as a tier or as a cache. Then, we design a Hidden Markov Model (HMM) based I/O pattern controller that manages workload placement and guarantees cost-effective storage that enhances the workload performance by up to 60%, and improves SSD’s lifespan by up to 40%. </div><div>The designed transfer learning modeling approach and the storage management solutions improve workload modeling accuracy, and the quality of the storage management policies while the testing setup changes.<br></div>

2021 ◽  
Author(s):  
Fatimah Alsayoud

Big data ecosystems contain a mix of sophisticated hardware storage components to support heterogeneous workloads. Storage components and the workloads interact and affect each other; therefore, their relationship has to consider when modeling workloads or managing storage. Efficient workload modeling guides optimal storage management decisions, and the right decisions help guarantee the workload’s needs. The first part of this thesis focuses on workload modeling efficiency, and the second part focuses on cost-effective storage management.<div>Workload performance modeling is an essential step in management decisions. The standard modeling approach constructs the model based on a historical dataset collected from one set of setups (scenario). The standard modeling approach requires the model to be reconstructed from scratch with every time the setups changes. To address this issue, we propose a cross-scenario modeling approach that improves the workload’s performance classification accuracy by up to 78% through adopting the Transfer Learning (TL).<br></div><div>The storage system is the most crucial component of the big data ecosystem, where the workload’s execution process starts by fetching data from it and ends by storing data into it. Thus, the workload’s performance is directly affected by storage capability. To provide a high I/O performance in the ecosystems, Solid State Drive (SSD) are utilized as a tier or as a cache on big data distributed ecosystems. SSDs have a short lifespan that is affected by data size and the number of writing operations. Balancing performance requirements and SSD’s lifespan consumption is never easy, and it’s even harder when interacting with a huge amount of data and with heterogeneous I/O patterns. In this thesis, we analysis big data workloads I/O pattern impacts on SSD’s lifespan when SSD is used as a tier or as a cache. Then, we design a Hidden Markov Model (HMM) based I/O pattern controller that manages workload placement and guarantees cost-effective storage that enhances the workload performance by up to 60%, and improves SSD’s lifespan by up to 40%. </div><div>The designed transfer learning modeling approach and the storage management solutions improve workload modeling accuracy, and the quality of the storage management policies while the testing setup changes.<br></div>


2012 ◽  
Vol 23 (4) ◽  
pp. 786-801 ◽  
Author(s):  
Xiang HUANG ◽  
Wei WANG ◽  
Wen-Bo ZHANG ◽  
Jun WEI ◽  
Tao HUANG

2021 ◽  
Vol 11 (13) ◽  
pp. 6047
Author(s):  
Soheil Rezaee ◽  
Abolghasem Sadeghi-Niaraki ◽  
Maryam Shakeri ◽  
Soo-Mi Choi

A lack of required data resources is one of the challenges of accepting the Augmented Reality (AR) to provide the right services to the users, whereas the amount of spatial information produced by people is increasing daily. This research aims to design a personalized AR that is based on a tourist system that retrieves the big data according to the users’ demographic contexts in order to enrich the AR data source in tourism. This research is conducted in two main steps. First, the type of the tourist attraction where the users interest is predicted according to the user demographic contexts, which include age, gender, and education level, by using a machine learning method. Second, the correct data for the user are extracted from the big data by considering time, distance, popularity, and the neighborhood of the tourist places, by using the VIKOR and SWAR decision making methods. By about 6%, the results show better performance of the decision tree by predicting the type of tourist attraction, when compared to the SVM method. In addition, the results of the user study of the system show the overall satisfaction of the participants in terms of the ease-of-use, which is about 55%, and in terms of the systems usefulness, about 56%.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Syed Iftikhar Hussain Shah ◽  
Vassilios Peristeras ◽  
Ioannis Magnisalis

AbstractThe public sector, private firms, business community, and civil society are generating data that is high in volume, veracity, velocity and comes from a diversity of sources. This kind of data is known as big data. Public Administrations (PAs) pursue big data as “new oil” and implement data-centric policies to transform data into knowledge, to promote good governance, transparency, innovative digital services, and citizens’ engagement in public policy. From the above, the Government Big Data Ecosystem (GBDE) emerges. Managing big data throughout its lifecycle becomes a challenging task for governmental organizations. Despite the vast interest in this ecosystem, appropriate big data management is still a challenge. This study intends to fill the above-mentioned gap by proposing a data lifecycle framework for data-driven governments. Through a Systematic Literature Review, we identified and analysed 76 data lifecycles models to propose a data lifecycle framework for data-driven governments (DaliF). In this way, we contribute to the ongoing discussion around big data management, which attracts researchers’ and practitioners’ interest.


Author(s):  
Marco Angrisani ◽  
Anya Samek ◽  
Arie Kapteyn

The number of data sources available for academic research on retirement economics and policy has increased rapidly in the past two decades. Data quality and comparability across studies have also improved considerably, with survey questionnaires progressively converging towards common ways of eliciting the same measurable concepts. Probability-based Internet panels have become a more accepted and recognized tool to obtain research data, allowing for fast, flexible, and cost-effective data collection compared to more traditional modes such as in-person and phone interviews. In an era of big data, academic research has also increasingly been able to access administrative records (e.g., Kostøl and Mogstad, 2014; Cesarini et al., 2016), private-sector financial records (e.g., Gelman et al., 2014), and administrative data married with surveys (Ameriks et al., 2020), to answer questions that could not be successfully tackled otherwise.


2018 ◽  
Vol 24 (1) ◽  
pp. 266-294 ◽  
Author(s):  
Amgad Badewi ◽  
Essam Shehab ◽  
Jing Zeng ◽  
Mostafa Mohamad

Purpose The purpose of this paper is to answer two research questions: what are the ERP resources and organizational complementary resources (OCRs) required to achieve each group of benefits? And on the basis of its resources, when should an organization invest more in ERP resources and/or OCRs so that the potential value of its ERP is realised? Design/methodology/approach Studying 12 organizations in different countries and validating the results with 8 consultants. Findings ERP benefits realization capability framework is developed; it shows that each group of benefits requires ERP resources (classified into features, attached technologies and information technology department competences) and OCRs (classified into practices, attitudes, culture, skills and organizational characteristics) and that leaping ahead to gain innovation benefits before being mature enough in realising a firm’s planning and automation capabilities could be a waste of time and effort. Research limitations/implications It is qualitative study. It needs to be backed by quantitative studies to test the results. Practical implications Although the “P” in ERP stands for planning, many academics and practitioners still believe that ERP applies to automation only. This research spotlights that the ability to invest in ERP can increase the innovation and planning capabilities of the organization only if it is extended and grown at the right time and if it is supported by OCRs. It is not cost effective to push an organization to achieve all the benefits at the same time; rather, it is clear that an organization would not be able to enjoy a higher level of benefits until it achieves a significant number of lower-level benefits. Thus, investing in higher-level benefit assets directly after an ERP implementation, when there are no organizational capabilities available to use these assets, could be inefficient. Moreover, it could be stressful to users when they see plenty of new ERP resources without the ability to use them. Although it could be of slight benefit to introduce, for example, business intelligence to employees in the “stabilizing period” (Badewi et al., 2013), from the financial perspective, it is a waste of money since the benefits would not be realised as expected. Therefore, orchestrating ERP assets with the development of organizational capabilities is important for achieving the greatest effectiveness and efficiency of the resources available to the organization. This research can be used as a benchmark for designing the various blueprints required to achieve different groups of benefits from ERP investments. Originality/value This research addresses two novel questions: RQ1: what are the ERP resources and OCRs required to achieve the different kinds of ERP benefits? RQ2: when, and on what basis, should an organization deploy more resources to leverage the ERP business value?


2021 ◽  
pp. 1-36
Author(s):  
Vahideh Angardi ◽  
Ali Ettehadi ◽  
Özgün Yücel

Abstract Effective separation of water and oil dispersions is considered a critical step in the determination of technical and economic success in the petroleum industry over the years. Moreover, a deeper understanding of the emulsification process and different affected parameters is essential for cost-effective oil production, transportation, and downstream processing. Numerous studies conducted on the concept of dispersion characterization indicate the importance of this concept, which deserves attention by the scientific community. Therefore, a comprehensive review study with critical analysis on significant concepts will help readers follow them easily. This study is a comprehensive review of the concept of dispersion characterization and conducted studies recently published. The main purposes of this review are to 1) Highlight flaws, 2) Outline gaps and weaknesses, 3) Address conflicts, 4) Prevent duplication of effort, 5) List factors affecting dispersion. It was found that the separation efficiency and stability of dispersions are affected by different chemical and physical factors. Factors affecting the stability of the emulsions have been studied in detail and will help to look for the right action to ensure stable emulsions. In addition, methods of ensuring stability, especially coalescence are highlighted, and coalescence mathematical explanations of phenomena are presented.


2017 ◽  
Vol 2 (Suppl. 1) ◽  
pp. 1-10
Author(s):  
Denis Horgan

In the fast-moving arena of modern healthcare with its cutting-edge science it is already, and will become more, vital that stakeholders collaborate openly and effectively. Transparency, especially on drug pricing, is of paramount importance. There is also a need to ensure that regulations and legislation covering, for example the new, smaller clinical trials required to make personalised medicine work effectively, and the huge practical and ethical issues surrounding Big Data and data protection, are common, understood and enforced across the EU. With more integration, collaboration, dialogue and increased trust among each and every one in the field, stakeholders can help mould the right frameworks, in the right place, at the right time. Once achieved, this will allow us all to work more quickly and more effectively towards creating a healthier - and thus wealthier - European Union.


Sign in / Sign up

Export Citation Format

Share Document