scholarly journals BIG: a large-scale data integration tool for renal physiology

2016 ◽  
Vol 311 (4) ◽  
pp. F787-F792 ◽  
Author(s):  
Yue Zhao ◽  
Chin-Rang Yang ◽  
Viswanathan Raghuram ◽  
Jaya Parulekar ◽  
Mark A. Knepper

Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: “How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?” This is the type of problem that has motivated the “Big-Data” revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/ .

Web Services ◽  
2019 ◽  
pp. 953-978
Author(s):  
Krishnan Umachandran ◽  
Debra Sharon Ferdinand-James

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.


Author(s):  
Krishnan Umachandran ◽  
Debra Sharon Ferdinand-James

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.


2022 ◽  
pp. 41-67
Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).


2017 ◽  
Vol 8 (2) ◽  
pp. 30-43
Author(s):  
Mrutyunjaya Panda

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yixue Zhu ◽  
Boyue Chai

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Li Guo ◽  
Kunlin Zhu ◽  
Ruijun Duan

In order to explore the economic development trend in the postepidemic era, this paper improves the traditional clustering algorithm and constructs a postepidemic economic development trend analysis model based on intelligent algorithms. In order to solve the clustering problem of large-scale nonuniform density data sets, this paper proposes an adaptive nonuniform density clustering algorithm based on balanced iterative reduction and uses the algorithm to further cluster the compressed data sets. For large-scale data sets, the clustering results can accurately reflect the class characteristics of the data set as a whole. Moreover, the algorithm greatly improves the time efficiency of clustering. From the research results, we can see that the improved clustering algorithm has a certain effect on the analysis of economic development trends in the postepidemic era and can continue to play a role in subsequent economic analysis.


2016 ◽  
Vol 4 (3) ◽  
pp. 1-21 ◽  
Author(s):  
Sungchul Lee ◽  
Eunmin Hwang ◽  
Ju-Yeon Jo ◽  
Yoohwan Kim

Due to the advancement of Information Technology (IT), the hospitality industry is seeing a great value in gathering various kinds of and a large amount of customers' data. However, many hotels are facing a challenge in analyzing customer data and using it as an effective tool to understand the hospitality customers better and, ultimately, to increase the revenue. The authors' research attempts to resolve the current challenges of analyzing customer data in hospitality by utilizing the big data analysis tools, especially Hadoop and R. Hadoop is a framework for processing large-scale data. With the integration of new approach, their study demonstrates the ways of aggregating and analyzing the hospitality customer data to find meaningful customer information. Multiple decision trees are constructed from the customer data sets with the intention of classifying customers' needs and customers' clusters. By analyzing the customer data, the study suggests three strategies to increase the total expenditure of the customers within a limited amount of time during their stay.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 98 ◽  
Author(s):  
Kenji Mizuseki ◽  
Kamran Diba ◽  
Eva Pastalkova ◽  
Jeff Teeters ◽  
Anton Sirota ◽  
...  

Using silicon-based recording electrodes, we recorded neuronal activity of the dorsal hippocampus and dorsomedial entorhinal cortex from behaving rats. The entorhinal neurons were classified as principal neurons and interneurons based on monosynaptic interactions and wave-shapes. The hippocampal neurons were classified as principal neurons and interneurons based on monosynaptic interactions, wave-shapes and burstiness. The data set contains recordings from 7,736 neurons (6,100 classified as principal neurons, 1,132 as interneurons, and 504 cells that did not clearly fit into either category) obtained during 442 recording sessions from 11 rats (a total of 204.5 hours) while they were engaged in one of eight different behaviours/tasks. Both original and processed data (time stamp of spikes, spike waveforms, result of spike sorting and local field potential) are included, along with metadata of behavioural markers. Community-driven data sharing may offer cross-validation of findings, refinement of interpretations and facilitate discoveries.


2020 ◽  
Vol 12 (11) ◽  
pp. 1794
Author(s):  
Naisen Yang ◽  
Hong Tang

Modern convolutional neural networks (CNNs) are often trained on pre-set data sets with a fixed size. As for the large-scale applications of satellite images, for example, global or regional mappings, these images are collected incrementally by multiple stages in general. In other words, the sizes of training datasets might be increased for the tasks of mapping rather than be fixed beforehand. In this paper, we present a novel algorithm, called GeoBoost, for the incremental-learning tasks of semantic segmentation via convolutional neural networks. Specifically, the GeoBoost algorithm is trained in an end-to-end manner on the newly available data, and it does not decrease the performance of previously trained models. The effectiveness of the GeoBoost algorithm is verified on the large-scale data set of DREAM-B. This method avoids the need for training on the enlarged data set from scratch and would become more effective along with more available data.


Author(s):  
Sadaf Afrashteh ◽  
Ida Someh ◽  
Michael Davern

Big data analytics uses algorithms for decision-making and targeting of customers. These algorithms process large-scale data sets and create efficiencies in the decision-making process for organizations but are often incomprehensible to customers and inherently opaque in nature. Recent European Union regulations require that organizations communicate meaningful information to customers on the use of algorithms and the reasons behind decisions made about them. In this paper, we explore the use of explanations in big data analytics services. We rely on discourse ethics to argue that explanations can facilitate a balanced communication between organizations and customers, leading to transparency and trust for customers as well as customer engagement and reduced reputation risks for organizations. We conclude the paper by proposing future empirical research directions.


Sign in / Sign up

Export Citation Format

Share Document