Cloud-Based Biological Data Processing

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.

Download Full-text

Neural Skyline Filtering for Imbalance Features Classification

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026817500195 ◽

2017 ◽

Vol 16 (03) ◽

pp. 1750019 ◽

Cited By ~ 8

Author(s):

Sonia Farhana Nimmy ◽

Md. Sarwar Kamal ◽

Muhammad Iqbal Hossain ◽

Nilanjan Dey ◽

Amira S. Ashour ◽

...

Keyword(s):

Data Processing ◽

Objective Function ◽

Large Volume ◽

Randomized Algorithm ◽

Large Data ◽

Vital Role ◽

Large Datasets ◽

Biological Data ◽

Experimental Result ◽

Filtering Techniques

In the current digitalized era, large datasets play a vital role in features extractions, information processing, knowledge mining and management. Sometimes, existing mining approaches are not sufficient to handle large volume of datasets. Biological data processing also suffers for the same issue. In the present work, a classification process is carried out on large volume of exons and introns from a set of raw data. The proposed work is designed into two parts as pre-processing and mapping-based classification. For pre-processing, three filtering techniques have been used. However, these traditional filtering techniques face difficulties for large datasets due to the long required time during large data processing as well as the large required memory size. In this regard, a mapping-based neural skyline filtering approach is designed. Randomized algorithm performed the mapping for large volume of datasets based on objective function. The objective function determines the randomized size of the datasets according to the homogeneity. Around 200 million DNA base pairs have been used for experimental analysis. Experimental result shows that mapping centric filtering outperforms other filtering techniques during large data processing.

Download Full-text

Data Processing Handbook for Complex Biological Data Sources

10.1016/c2018-0-00193-0 ◽

2019 ◽

Keyword(s):

Data Processing ◽

Biological Data ◽

Data Sources

Download Full-text

A case study of high-throughput biological data processing on parallel platforms

Bioinformatics ◽

10.1093/bioinformatics/bth184 ◽

2004 ◽

Vol 20 (12) ◽

pp. 1940-1947 ◽

Cited By ~ 6

Author(s):

D. Pekurovsky ◽

I. N. Shindyalov ◽

P. E. Bourne

Keyword(s):

Data Processing ◽

High Throughput ◽

Biological Data

Download Full-text

A Conceptual Matrix Model for Biological Data Processing

2007 Inaugural IEEE-IES Digital EcoSystems and Technologies Conference ◽

10.1109/dest.2007.372038 ◽

2007 ◽

Author(s):

Jingyu Hou ◽

Wanlei Zhou

Keyword(s):

Data Processing ◽

Matrix Model ◽

Biological Data

Download Full-text

A Dynamic Data-Driven Framework for Biological Data Using 2D Barcodes

Computational and Mathematical Methods in Medicine ◽

10.1155/2012/892098 ◽

2012 ◽

Vol 2012 ◽

pp. 1-5

Author(s):

Hui Li ◽

Chunmei Liu

Keyword(s):

Data Processing ◽

Laboratory Data ◽

High Capacity ◽

Physical Object ◽

Biological Data ◽

Data Driven ◽

Digital Information ◽

Laboratory Equipment ◽

Work Related ◽

2D Barcode

Biology data is increasing exponentially from biological laboratories. It is a complicated problem for further processing the data. Processing computational data and data from biological laboratories manually may lead to potential errors in further analysis. In this paper, we proposed an efficient data-driven framework to inspect laboratory equipment and reduce impending failures. Our method takes advantage of the 2D barcode technology which can be installed on the specimen as a trigger for the data-driven system. For this end, we proposed a series of algorithms to speed up the data processing. The results show that the proposed system increases the system's scalability and flexibility. Also, it demonstrates the ability of linking a physical object with digital information to reduce the manual work related to experimental specimen. The characteristics such as high capacity of storage and data management of the 2D barcode technology provide a solution to collect experimental laboratory data in a quick and accurate fashion.

Download Full-text

RESEARCH ON ADVANCED COMPUTER METHODS FOR BIOLOGICAL DATA PROCESSING

10.21236/ad0637452 ◽

1966 ◽

Cited By ~ 5

Author(s):

D. N. Streeter ◽

J. Raviv

Keyword(s):

Data Processing ◽

Biological Data ◽

Computer Methods ◽

Advanced Computer

Download Full-text

Biological Data Processing using Model Driven Engineering

IEEE Latin America Transactions ◽

10.1109/tla.2008.4815285 ◽

2008 ◽

Vol 6 (4) ◽

pp. 324-331 ◽

Cited By ~ 2

Author(s):

A. Gomez ◽

A. Boronat ◽

J.A. Carsi ◽

I. Ramos ◽

C. Taubner ◽

...

Keyword(s):

Data Processing ◽

Biological Data ◽

Model Driven Engineering ◽

Model Driven

Download Full-text

THE DESIGN OF A DATA PROCESSING CENTER FOR BIOLOGICAL DATA

Annals of the New York Academy of Sciences ◽

10.1111/j.1749-6632.1964.tb50646.x ◽

1964 ◽

Vol 115 (2) ◽

pp. 547-552 ◽

Cited By ~ 6

Author(s):

Henry J. Juenemann

Keyword(s):

Data Processing ◽

Biological Data ◽

Data Processing Center

Download Full-text

Reproducible Statistical Analysis in Microarray Profiling Studies

Methods of Information in Medicine ◽

10.1055/s-0038-1634057 ◽

2006 ◽

Vol 45 (02) ◽

pp. 139-145

Author(s):

M. Ruschhaupt ◽

W. Huber ◽

U. Mansmann

Keyword(s):

Data Processing ◽

Complex Analysis ◽

Biological Data ◽

Primary Data ◽

Data Sets ◽

Computational Framework ◽

Microarray Profiling ◽

Number Of Publications ◽

Algorithmic Approaches ◽

Derived Data

Summary Objectives: Microarrays are a recent biotechnology that offers the hope of improved cancer classification. A number of publications presented clinically promising results by combining this new kind of biological data with specifically designed algorithmic approaches. But, reproducing published results in this domain is harder than it may seem. Methods: This paper presents examples, discusses the problems hidden in the published analyses and demonstrates a strategy to improve the situation which is based on the vignette technology available from the R and Bioconductor projects. Results: The tool of a compendium is discussed to achieve reproducible calculations and to offer an extensible computational framework. A compendium is a document that bundles primary data, processing methods (computational code), derived data, and statistical output with textual documentation and conclusions. It is interactive in the sense that it allows for the modification of the processing options, plugging in new data, or inserting further algorithms and visualizations. Conclusions: Due to the complexity of the algorithms, the size of the data sets, and the limitations of the medium printed paper it is usually not possible to report all the minutiae of the data processing and statistical computations. The technique of a compendium allows a complete critical assessment of a complex analysis.

Download Full-text