scholarly journals Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects

GigaScience ◽  
2021 ◽  
Vol 10 (12) ◽  
Author(s):  
Nathan C Sheffield ◽  
Michał Stolarczyk ◽  
Vincent P Reuter ◽  
André F Rendeiro

Abstract Background Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, metadata formats from a data provider are often incompatible with requirements of a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software. Results To address this, we present the Portable Encapsulated Project (PEP) specification, a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many biological samples. In addition to standardization, the PEP specification provides descriptors and modifiers for project-level and sample-level metadata, which improve portability across both computing environments and data processing tools. PEPs include a schema validator framework, allowing formal definition of required metadata attributes for data analysis broadly. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata. Conclusions The PEP specification is an important step toward unifying data annotation and processing tools in data-intensive biological research projects. Links to tools and documentation are available at http://pep.databio.org/.

2020 ◽  
Author(s):  
Nathan C. Sheffield ◽  
Michał Stolarczyk ◽  
Vincent P. Reuter ◽  
André F. Rendeiro

Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, incompatibility is common between metadata format of a data source and that required by a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software. To address this, we present Portable Encapsulated Projects (PEP), a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many samples, whether from individual experiments, organisms, or single cells. In addition to standardization, the PEP specification provides descriptors and modifiers for different organizational layers of a project, which improve portability among computing environments and facilitate use of different processing tools. PEP includes a schema validator framework, allowing formal definition of required metadata attributes for any type of biomedical data analysis. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata. PEP therefore presents an important step toward unifying data annotation and processing tools in data-intensive biological research projects.


1997 ◽  
Vol 3 (S2) ◽  
pp. 1081-1082
Author(s):  
I. Angert ◽  
W. Jahn ◽  
K.C. Holmes ◽  
R.R. Schröder

Understanding the contrast formation mechanism in the EM is one of the prerequisites for artefact-free reconstruction of biological structures from images. We found that the normally used correction of contrast formation applied to zero energy loss filtered images corrupted spatial resolution. Therefore the contribution of contrast formed by inelastic electrons was reconsidered, including partial coherence of inelastically scattered electrons and lens aberrations of the microscope. Based on this, a complete description of the zero-loss contrast transfer function (CTF) is now possible.We used tobacco mosaic virus (TMV), a biological sample known at atomic resolution, for definition of optimum CTF-parameters to reconstruct defocus series from an EFTEM LEO 912. CTF theory as known so far describes image contrast in the weak phase approximation as a linear sum of amplitude and phase contrast. The contribution of amplitude contrast (ratio of amplitude to phase contrast A/P) was determined to be between 7% and 5 % for unfiltered images and 12-14 % for zero-loss filtered images. However, in a filter microscope we remove electrons from the image, so we expect a higher amplitude contrast than in non-filtered images.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1471
Author(s):  
Jun-Yeong Lee ◽  
Moon-Hyun Kim ◽  
Syed Asif Raza Raza Shah ◽  
Sang-Un Ahn ◽  
Heejun Yoon ◽  
...  

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.


1999 ◽  
Vol 17 (2) ◽  
pp. 131-133 ◽  
Author(s):  
Dina Ralt

There have been a variety of Western explanations for the Qi of traditional Chinese medicine, but all have essentially had to compromise between expression of energy, matter and flow. The author suggests that a non-linear, fractal approach, similar to that of Chaos theory, offers a tool to understand Qi; the yin-yang and five phases theories of Chinese philosophy can be regarded as fractals. Qi, as the “net of life”, can also be looked on as an information network with close parallels to the computer-based web of the internet. This article therefore suggests a new Western definition of Qi, proposing that: “The Qi of Chinese medicine is inter-cellular information communicated within the body: information which enables all bodily functions and is a key component in regulation”. Referring to Qi as information offers the chance to integrate Chinese medical philosophy with current biological research on cellular communication.


Author(s):  
Andreas Lorenz

The use of mobile and hand-held devices is a desirable option for implementation of user interaction with remote services from a distance, whereby the user should be able to select the input device depending on personal preferences, capabilities and availability of interaction devices. Because of the heterogeneity of available devices and interaction styles, the interoperability needs particular attention by the developer. This paper describes the design of a general solution to enable mobile devices to have control on services at remote hosts. The applied approach enhances the idea of separating the user interface from the application logic, leading to the definition of virtual or logical input devices physically separated from the controlled services.


Author(s):  
Orazio Tomarchio ◽  
Giuseppe Di Modica ◽  
Marco Cavallo ◽  
Carmelo Polito

Advances in the communication technologies, along with the birth of new communication paradigms leveraging on the power of the social, has fostered the production of huge amounts of data. Old-fashioned computing paradigms are unfit to handle the dimensions of the data daily produced by the countless, worldwide distributed sources of information. So far, the MapReduce has been able to keep the promise of speeding up the computation over Big Data within a cluster. This article focuses on scenarios of worldwide distributed Big Data. While stigmatizing the poor performance of the Hadoop framework when deployed in such scenarios, it proposes the definition of a Hierarchical Hadoop Framework (H2F) to cope with the issues arising when Big Data are scattered over geographically distant data centers. The article highlights the novelty introduced by the H2F with respect to other hierarchical approaches. Tests run on a software prototype are also reported to show the increase of performance that H2F is able to achieve in geographical scenarios over a plain Hadoop approach.


2013 ◽  
Vol 756-759 ◽  
pp. 3318-3323
Author(s):  
Qi Zhi Deng ◽  
Long Bo Zhang ◽  
Xin Qian ◽  
Ya Li Chen ◽  
Feng Ying Wang

In order to solve the problem of how to improve the scalability of data processing capabilities and the data availability which encountered by data mining techniques for Data-intensive computing, a new method of tree learning is presented in this paper. By introducing the MapReduce, the tree learning method based on SPRINT can obtain a well scalability when address large datasets. Moreover, we define the process of split point as a series of distributed computations, which is implemented with the MapReduce model respectively. And a new data structure called class distribution table is introduced to assist the calculation of histogram. Experiments and results analysis shows that the algorithm has strong processing capabilities of data mining for data-intensive computing environments.


1957 ◽  
Vol 61 (563) ◽  
pp. 727-755 ◽  
Author(s):  
E. W. Still

SummaryThe general requirements for the complete air conditioning of aircraft are discussed in the light of the complete system concept. The author takes into consideration safety, differential pressure, weight saving, power and air supply, passenger comfort, cooling and humidity. Particular systems are then described and there is a section on the test equipment required for the laboratory testing of air conditioning equipment. Cooling systems are taken first and divided into the air cycle system embodying bootstrap, turbine fan, and regenerative applications of cold air units–the vapour cycle system employing a boiling tank and that using proprietary refrigerants; properties of liquid refrigerants are discussed. Regulation of cabin temperature, air flow, humidity, pressure and oxygen is done by control systems and the equipment used is described. Four appendices give (1) suggested detailed requirements for air conditioning equipment and user requirements, (2) sample data and calculations for air conditioning a 100-seater civil transport, (3) some notes on the definition of refrigeration terms and (4) data on pressure losses in aircraft ducts.


Sign in / Sign up

Export Citation Format

Share Document