Using pre-existing datasets to combine published information with new metrics would help researchers construct a broader picture of chromatin in disease

Mapping Intimacies ◽

10.31219/osf.io/gsqv5 ◽

2021 ◽

Author(s):

Moataz Dowaidar

Keyword(s):

Big Data ◽

Large Scale ◽

Time Integration ◽

Basic Knowledge ◽

Data Sets ◽

Team Members ◽

3D Genome ◽

Cell To Cell Variability ◽

Changes Over Time ◽

Genome Representation

Using pre-existing datasets to combine published information with new metrics would help researchers construct a broader picture of chromatin in disease. A computational biology goal is the near-real-time integration of epigenomic data sets, irrespective of the laboratory they were generated in—similar to a blood pressure, ECG or troponin test. In addition, epigenome modeling must become dynamic, considering cell-to-cell variability and changes over time due to normal physiological or pathological stressors. Probabilistic modeling and machine learning can help such model creation, while finding (and quantifying) previously identified developing chromatin properties that match heart health changes. A 3D genome representation, for example, may reveal a structural or accessibility attribute connected to health or disease that no single epigenomic test alone can discover. Such strategies can expand basic knowledge of biology and illness.Incorporating wet and dry lab training components to teach schemes to foster the formation of more diverse technical repertoires. Data mining and fresh data collection will revolutionize how we handle chromatin challenges in coming years. Knowing how computers solve problems (as opposed to how people do) and how to computationally phrase questions would create a shared vocabulary that completes tasks. Team members don't need all the big data skills, but a collaborative attitude is important for effective large-scale epigenomic research. UCLA's QCBio Collaboratory is a great platform for teaching non-programmers and facilitating cooperation to resolve biological issues.It also encourages the use of open source technology by making genomics datasets available to non-experts.There are already many bioinformatics tools—and others will be developed to introduce new understanding—but basic knowledge of how computers work and how to answer big-data questions will continue to empower scientists to test the most meaningful hypotheses with appropriate tools to reveal new insights about cardiac biology.

Download Full-text

Neural Network for Big Data Sets

10.4018/978-1-6684-2408-7.ch003 ◽

2022 ◽

pp. 41-67

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

On the Effectiveness of Hybrid Canopy with Hoeffding Adaptive Naive Bayes Trees

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2017040102 ◽

2017 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Mrutyunjaya Panda

Keyword(s):

Big Data ◽

Clustering Analysis ◽

Large Scale ◽

Data Sets ◽

Recent Past ◽

Large Scale Data ◽

Huge Data ◽

With Memory ◽

Memory Constraints ◽

Scale Data

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.

Download Full-text

Influencing Factors of e-Commerce Enterprise Development Based on Mobile Computing Big Data Analysis

Wireless Communications and Mobile Computing ◽

10.1155/2021/8750111 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yixue Zhu ◽

Boyue Chai

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Support Vector ◽

Data Sets ◽

Large Scale Data ◽

Vector Machines ◽

Physical Information ◽

Scale Data

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.

Download Full-text

Interoperability in Internet of Media Things and Integration Big Media

10.4018/978-1-7998-4186-9.ch004 ◽

2022 ◽

pp. 59-79

Author(s):

Dragorad A. Milovanovic ◽

Vladan Pantovic

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

Reference Model ◽

Big Data Analytics ◽

Multimedia Data ◽

Data Sets ◽

Large Scale Systems ◽

New Class ◽

Media Applications

Multimedia-related things is a new class of connected objects that can be searched, discovered, and composited on the internet of media things (IoMT). A huge amount of data sets come from audio-visual sources or have a multimedia nature. However, multimedia data is currently not incorporated in the big data (BD) frameworks. The research projects, standardization initiatives, and industrial activities for integration are outlined in this chapter. MPEG IoMT interoperability and network-based media processing (NBMP) framework as an instance of the big media (BM) reference model are explored. Conceptual model of IoT and big data integration for analytics is proposed. Big data analytics is rapidly evolving both in terms of functionality and the underlying model. The authors pointed out that IoMT analytics is closely related to big data analytics, which facilitates the integration of multimedia objects in big media applications in large-scale systems. These two technologies are mutually dependent and should be researched and developed jointly.

Download Full-text

A Novel Approach for Clustering Big Data based on MapReduce

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i3.pp1711-1719 ◽

2018 ◽

Vol 8 (3) ◽

pp. 1711 ◽

Cited By ~ 1

Author(s):

Gourav Bathla ◽

Himanshu Aggarwal ◽

Rinkle Rani

Keyword(s):

Big Data ◽

Categorical Data ◽

Large Scale ◽

Clustering Algorithms ◽

Numerical Data ◽

Large Data ◽

Data Sets ◽

Single Node ◽

Novel Approach ◽

Network Analytics

Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.

Download Full-text

Caring for (Big) Data: An Introduction to Research Methodologies and Ethical Challenges in Digital Migration Studies

10.1007/978-3-030-81226-3_1 ◽

2021 ◽

pp. 1-21

Author(s):

Marie Sandberg ◽

Luca Rossi

Keyword(s):

Big Data ◽

Large Scale ◽

Social Activity ◽

Data Access ◽

Digital Data ◽

Data Sets ◽

Ethical Challenges ◽

Migration Studies ◽

Critical Approach ◽

Migration Research

AbstractDigital technologies present new methodological and ethical challenges for migration studies: from ensuring data access in ethically viable ways to privacy protection, ensuring autonomy, and security of research participants. This Introductory chapter argues that the growing field of digital migration research requires new modes of caring for (big) data. Besides from methodological and ethical reflexivity such care work implies the establishing of analytically sustainable and viable environments for the respective data sets—from large-scale data sets (“big data”) to ethnographic materials. Further, it is argued that approaching migrants’ digital data “with care” means pursuing a critical approach to the use of big data in migration research where the data is not an unquestionable proxy for social activity but rather a complex construct of which the underlying social practices (and vulnerabilities) need to be fully understood. Finally, it is presented how the contributions of this book offer an in-depth analysis of the most crucial methodological and ethical challenges in digital migration studies and reflect on ways to move this field forward.

Download Full-text

Towards Big Linked Data

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2013100102 ◽

2013 ◽

Vol 9 (4) ◽

pp. 19-43 ◽

Cited By ~ 1

Author(s):

Bo Hu ◽

Nuno Carvalho ◽

Takahide Matsutsuka

Keyword(s):

Big Data ◽

Large Scale ◽

Stress Test ◽

Open Data ◽

Small Data ◽

Data Sets ◽

And Performance ◽

Machine Readable ◽

Future Work ◽

Semantic Layer

In light of the challenges of effectively managing Big Data, the authors are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets. In this paper, the authors propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its “semantic” nature. The authors demonstrate how RDF-based semantic models can be distributed across multiple storage servers and the authors examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. The authors' future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.

Download Full-text

Big Data Analysis with Hadoop on Personalized Incentive Model with Statistical Hotel Customer Data

International Journal of Software Innovation ◽

10.4018/ijsi.2016070101 ◽

2016 ◽

Vol 4 (3) ◽

pp. 1-21 ◽

Cited By ~ 4

Author(s):

Sungchul Lee ◽

Eunmin Hwang ◽

Ju-Yeon Jo ◽

Yoohwan Kim

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Data Sets ◽

New Approach ◽

Customer Data ◽

Customer Information ◽

Large Scale Data ◽

Incentive Model

Due to the advancement of Information Technology (IT), the hospitality industry is seeing a great value in gathering various kinds of and a large amount of customers' data. However, many hotels are facing a challenge in analyzing customer data and using it as an effective tool to understand the hospitality customers better and, ultimately, to increase the revenue. The authors' research attempts to resolve the current challenges of analyzing customer data in hospitality by utilizing the big data analysis tools, especially Hadoop and R. Hadoop is a framework for processing large-scale data. With the integration of new approach, their study demonstrates the ways of aggregating and analyzing the hospitality customer data to find meaningful customer information. Multiple decision trees are constructed from the customer data sets with the intention of classifying customers' needs and customers' clusters. By analyzing the customer data, the study suggests three strategies to increase the total expenditure of the customers within a limited amount of time during their stay.

Download Full-text

Large Scale Fault Data Analysis and OSS Reliability Assessment Based on Quantification Method of the First Type

Machine Learning and Knowledge Extraction ◽

10.3390/make2040024 ◽

2020 ◽

Vol 2 (4) ◽

pp. 436-452

Author(s):

Yoshinobu Tamura ◽

Shigeru Yamada

Keyword(s):

Big Data ◽

Large Scale ◽

Reliability Assessment ◽

Sensor Data ◽

Weather Data ◽

Data Sets ◽

Log Data ◽

Customer Data ◽

Server Side ◽

Quantification Method

Various big data sets are recorded on the server side of computer system. The big data are well defined as a volume, variety, and velocity (3V) model. The 3V model has been proposed by Gartner, Inc. as a first press release. 3V model means the volume, variety, and velocity in terms of data. The big data have 3V in well balance. Then, there are various categories in terms of the big data, e.g., sensor data, log data, customer data, financial data, weather data, picture data, movie data, and so on. In particular, the fault big data are well-known as the characteristic log data in software engineering. In this paper, we analyze the fault big data considering the unique features that arise from big data under the operation of open source software. In addition, we analyze actual data to show numerical examples of reliability assessment based on the results of multiple regression analysis well-known as the quantification method of the first type.

Download Full-text