Using pre-existing datasets to combine published information with new metrics would help researchers construct a broader picture of chromatin in disease

2021 ◽  
Author(s):  
Moataz Dowaidar

Using pre-existing datasets to combine published information with new metrics would help researchers construct a broader picture of chromatin in disease. A computational biology goal is the near-real-time integration of epigenomic data sets, irrespective of the laboratory they were generated in—similar to a blood pressure, ECG or troponin test. In addition, epigenome modeling must become dynamic, considering cell-to-cell variability and changes over time due to normal physiological or pathological stressors. Probabilistic modeling and machine learning can help such model creation, while finding (and quantifying) previously identified developing chromatin properties that match heart health changes. A 3D genome representation, for example, may reveal a structural or accessibility attribute connected to health or disease that no single epigenomic test alone can discover. Such strategies can expand basic knowledge of biology and illness.Incorporating wet and dry lab training components to teach schemes to foster the formation of more diverse technical repertoires. Data mining and fresh data collection will revolutionize how we handle chromatin challenges in coming years. Knowing how computers solve problems (as opposed to how people do) and how to computationally phrase questions would create a shared vocabulary that completes tasks. Team members don't need all the big data skills, but a collaborative attitude is important for effective large-scale epigenomic research. UCLA's QCBio Collaboratory is a great platform for teaching non-programmers and facilitating cooperation to resolve biological issues.It also encourages the use of open source technology by making genomics datasets available to non-experts.There are already many bioinformatics tools—and others will be developed to introduce new understanding—but basic knowledge of how computers work and how to answer big-data questions will continue to empower scientists to test the most meaningful hypotheses with appropriate tools to reveal new insights about cardiac biology.

2022 ◽  
pp. 41-67
Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


2017 ◽  
Vol 8 (2) ◽  
pp. 30-43
Author(s):  
Mrutyunjaya Panda

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yixue Zhu ◽  
Boyue Chai

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.


2022 ◽  
pp. 59-79
Author(s):  
Dragorad A. Milovanovic ◽  
Vladan Pantovic

Multimedia-related things is a new class of connected objects that can be searched, discovered, and composited on the internet of media things (IoMT). A huge amount of data sets come from audio-visual sources or have a multimedia nature. However, multimedia data is currently not incorporated in the big data (BD) frameworks. The research projects, standardization initiatives, and industrial activities for integration are outlined in this chapter. MPEG IoMT interoperability and network-based media processing (NBMP) framework as an instance of the big media (BM) reference model are explored. Conceptual model of IoT and big data integration for analytics is proposed. Big data analytics is rapidly evolving both in terms of functionality and the underlying model. The authors pointed out that IoMT analytics is closely related to big data analytics, which facilitates the integration of multimedia objects in big media applications in large-scale systems. These two technologies are mutually dependent and should be researched and developed jointly.


Author(s):  
Gourav Bathla ◽  
Himanshu Aggarwal ◽  
Rinkle Rani

Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.


2021 ◽  
pp. 1-21
Author(s):  
Marie Sandberg ◽  
Luca Rossi

AbstractDigital technologies present new methodological and ethical challenges for migration studies: from ensuring data access in ethically viable ways to privacy protection, ensuring autonomy, and security of research participants. This Introductory chapter argues that the growing field of digital migration research requires new modes of caring for (big) data. Besides from methodological and ethical reflexivity such care work implies the establishing of analytically sustainable and viable environments for the respective data sets—from large-scale data sets (“big data”) to ethnographic materials. Further, it is argued that approaching migrants’ digital data “with care” means pursuing a critical approach to the use of big data in migration research where the data is not an unquestionable proxy for social activity but rather a complex construct of which the underlying social practices (and vulnerabilities) need to be fully understood. Finally, it is presented how the contributions of this book offer an in-depth analysis of the most crucial methodological and ethical challenges in digital migration studies and reflect on ways to move this field forward.


2013 ◽  
Vol 9 (4) ◽  
pp. 19-43 ◽  
Author(s):  
Bo Hu ◽  
Nuno Carvalho ◽  
Takahide Matsutsuka

In light of the challenges of effectively managing Big Data, the authors are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets. In this paper, the authors propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its “semantic” nature. The authors demonstrate how RDF-based semantic models can be distributed across multiple storage servers and the authors examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. The authors' future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.


2016 ◽  
Vol 4 (3) ◽  
pp. 1-21 ◽  
Author(s):  
Sungchul Lee ◽  
Eunmin Hwang ◽  
Ju-Yeon Jo ◽  
Yoohwan Kim

Due to the advancement of Information Technology (IT), the hospitality industry is seeing a great value in gathering various kinds of and a large amount of customers' data. However, many hotels are facing a challenge in analyzing customer data and using it as an effective tool to understand the hospitality customers better and, ultimately, to increase the revenue. The authors' research attempts to resolve the current challenges of analyzing customer data in hospitality by utilizing the big data analysis tools, especially Hadoop and R. Hadoop is a framework for processing large-scale data. With the integration of new approach, their study demonstrates the ways of aggregating and analyzing the hospitality customer data to find meaningful customer information. Multiple decision trees are constructed from the customer data sets with the intention of classifying customers' needs and customers' clusters. By analyzing the customer data, the study suggests three strategies to increase the total expenditure of the customers within a limited amount of time during their stay.


2020 ◽  
Vol 2 (4) ◽  
pp. 436-452
Author(s):  
Yoshinobu Tamura ◽  
Shigeru Yamada

Various big data sets are recorded on the server side of computer system. The big data are well defined as a volume, variety, and velocity (3V) model. The 3V model has been proposed by Gartner, Inc. as a first press release. 3V model means the volume, variety, and velocity in terms of data. The big data have 3V in well balance. Then, there are various categories in terms of the big data, e.g., sensor data, log data, customer data, financial data, weather data, picture data, movie data, and so on. In particular, the fault big data are well-known as the characteristic log data in software engineering. In this paper, we analyze the fault big data considering the unique features that arise from big data under the operation of open source software. In addition, we analyze actual data to show numerical examples of reliability assessment based on the results of multiple regression analysis well-known as the quantification method of the first type.


Sign in / Sign up

Export Citation Format

Share Document