Application of Differential Network Enrichment Analysis for Deciphering Metabolic Alterations

Modern analytical methods allow for the simultaneous detection of hundreds of metabolites, generating increasingly large and complex data sets. The analysis of metabolomics data is a multi-step process that involves data processing and normalization, followed by statistical analysis. One of the biggest challenges in metabolomics is linking alterations in metabolite levels to specific biological processes that are disrupted, contributing to the development of disease or reflecting the disease state. A common approach to accomplishing this goal involves pathway mapping and enrichment analysis, which assesses the relative importance of predefined metabolic pathways or other biological categories. However, traditional knowledge-based enrichment analysis has limitations when it comes to the analysis of metabolomics and lipidomics data. We present a Java-based, user-friendly bioinformatics tool named Filigree that provides a primarily data-driven alternative to the existing knowledge-based enrichment analysis methods. Filigree is based on our previously published differential network enrichment analysis (DNEA) methodology. To demonstrate the utility of the tool, we applied it to previously published studies analyzing the metabolome in the context of metabolic disorders (type 1 and 2 diabetes) and the maternal and infant lipidome during pregnancy.

Download Full-text

Developing sustainable software solutions for bioinformatics by the “Butterfly” paradigm

F1000Research ◽

10.12688/f1000research.3681.1 ◽

2014 ◽

Vol 3 ◽

pp. 71 ◽

Cited By ~ 8

Author(s):

Zeeshan Ahmed ◽

Saman Zeeshan ◽

Thomas Dandekar

Keyword(s):

Software Engineering ◽

Data Representation ◽

Data Sets ◽

Complex Data ◽

Scientific Software ◽

Rapid Changes ◽

Complex Data Sets ◽

Key Steps ◽

User Friendly

Software design and sustainable software engineering are essential for the long-term development of bioinformatics software. Typical challenges in an academic environment are short-term contracts, island solutions, pragmatic approaches and loose documentation. Upcoming new challenges are big data, complex data sets, software compatibility and rapid changes in data representation. Our approach to cope with these challenges consists of iterative intertwined cycles of development (“Butterfly” paradigm) for key steps in scientific software engineering. User feedback is valued as well as software planning in a sustainable and interoperable way. Tool usage should be easy and intuitive. A middleware supports a user-friendly Graphical User Interface (GUI) as well as a database/tool development independently. We validated the approach of our own software development and compared the different design paradigms in various software solutions.

Download Full-text

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

Journal of Grid Computing ◽

10.1007/s10723-020-09518-y ◽

2020 ◽

Vol 18 (3) ◽

pp. 507-527

Author(s):

M. Ghorbani ◽

S. Swift ◽

S. J. E. Taylor ◽

A. M. Payne

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Distributed Data ◽

Complex Data ◽

Network Computing ◽

Data Repositories ◽

Complex Data Sets ◽

User Friendly

Abstract The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.

Download Full-text

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Journal of Scientific Research and Reports ◽

10.9734/jsrr/2020/v26i930310 ◽

2020 ◽

pp. 58-84

Author(s):

Abou_el_ela Abdou Hussein

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Database Management System ◽

Data Sets ◽

Complex Data ◽

Daily Data ◽

Complex Data Sets ◽

Cost Efficient ◽

Hadoop System

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

Download Full-text

Alternative Clustering

Advances in Business Information Systems and Analytics - Applying Predictive Analytics Within the Service Sector ◽

10.4018/978-1-5225-2148-8.ch001 ◽

2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Avinash Navlani ◽

V. B. Gupta

Keyword(s):

Research Problem ◽

Research Community ◽

Data Sets ◽

Complex Data ◽

High Quality ◽

Data Set ◽

Alternative Clustering ◽

Complex Data Sets ◽

Data Objects ◽

Community Clustering

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.

Download Full-text

Science Communication with Dinosaurs

Handbook of Research on Computational Science and Engineering - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-61350-116-0.ch024 ◽

2012 ◽

pp. 587-611

Author(s):

Phillip L. Manning ◽

Peter L. Falkingham

Keyword(s):

High Performance Computing ◽

Public Engagement ◽

Science Communication ◽

High Performance ◽

Computational Science ◽

Data Sets ◽

Complex Data ◽

Multidisciplinary Research ◽

Complex Data Sets ◽

Performance Computing

Dinosaurs successfully conjure images of lost worlds and forgotten lives. Our understanding of these iconic, extinct animals now comes from many disciplines, not just the science of palaeontology. In recent years palaeontology has benefited from the application of new and existing techniques from physics, biology, chemistry, engineering, but especially computational science. The application of computers in palaeontology is highlighted in this chapter as a key area of development in studying fossils. The advances in high performance computing (HPC) have greatly aided and abetted multiple disciplines and technologies that are now feeding paleontological research, especially when dealing with large and complex data sets. We also give examples of how such multidisciplinary research can be used to communicate not only specific discoveries in palaeontology, but also the methods and ideas, from interrelated disciplines to wider audiences. Dinosaurs represent a useful vehicle that can help enable wider public engagement, communicating complex science in digestible chunks.

Download Full-text

Anomaly Detection for Inferring Social Structure

Social Computing ◽

10.4018/978-1-60566-984-7.ch118 ◽

2010 ◽

pp. 1797-1803

Author(s):

Lisa Friedland

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Social Structure ◽

Small Groups ◽

Analysis Data ◽

Data Sets ◽

Complex Data ◽

Detection Approach ◽

Complex Data Sets ◽

Data Points

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).

Download Full-text

Introduction to Big Data and Business Analytics

10.4018/978-1-6684-3662-2.ch004 ◽

2022 ◽

pp. 67-76

Author(s):

Dineshkumar Bhagwandas Vaghela

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Data Sets ◽

Complex Data ◽

Business Analytics ◽

Database Applications ◽

Complex Data Sets ◽

History Of ◽

Rapid Generation

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.

Download Full-text

Bayesian Modelling for Machine Learning

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch044 ◽

2005 ◽

pp. 236-242

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.

Download Full-text

A MODIFIED KOHONEN SELF-ORGANIZING MAP (KSOM) CLUSTERING FOR FOUR CATEGORICAL DATA

Jurnal Teknologi ◽

10.11113/jt.v78.9275 ◽

2016 ◽

Vol 78 (6-13) ◽

Author(s):

Azlin Ahmad ◽

Rubiyah Yusof

Keyword(s):

Breast Cancer ◽

Categorical Data ◽

Data Sets ◽

Complex Data ◽

Self Organizing Map ◽

Distance Calculation ◽

The Neural Network ◽

Complex Data Sets ◽

Separable Problems ◽

Self Organizing

The Kohonen Self-Organizing Map (KSOM) is one of the Neural Network unsupervised learning algorithms. This algorithm is used in solving problems in various areas, especially in clustering complex data sets. Despite its advantages, the KSOM algorithm has a few drawbacks; such as overlapped cluster and non-linear separable problems. Therefore, this paper proposes a modified KSOM that inspired from pheromone approach in Ant Colony Optimization. The modification is focusing on the distance calculation amongst objects. The proposed algorithm has been tested on four real categorical data that are obtained from UCI machine learning repository; Iris, Seeds, Glass and Wisconsin Breast Cancer Database. From the results, it shows that the modified KSOM has produced accurate clustering result and all clusters can clearly be identified.

Download Full-text

Application of Multivariate-Rank-Based Techniques in Clustering of Big Data

Vikalpa The Journal for Decision Makers ◽

10.1177/0256090918804385 ◽

2018 ◽

Vol 43 (4) ◽

pp. 179-190

Author(s):

Pritha Guha

Keyword(s):

Big Data ◽

Dna Analysis ◽

Retail Banking ◽

Data Sets ◽

Complex Data ◽

Data Handling ◽

Credit Risk Management ◽

Statistical Tools ◽

Executive Summary ◽

Complex Data Sets

Executive Summary Very large or complex data sets, which are difficult to process or analyse using traditional data handling techniques, are usually referred to as big data. The idea of big data is characterized by the three ‘v’s which are volume, velocity, and variety ( Liu, McGree, Ge, & Xie, 2015 ) referring respectively to the volume of data, the velocity at which the data are processed and the wide varieties in which big data are available. Every single day, different sectors such as credit risk management, healthcare, media, retail, retail banking, climate prediction, DNA analysis and, sports generate petabytes of data (1 petabyte = 250 bytes). Even basic handling of big data, therefore, poses significant challenges, one of them being organizing the data in such a way that it can give better insights into analysing and decision-making. With the explosion of data in our life, it has become very important to use statistical tools to analyse them.

Download Full-text