RITAN: rapid integration of term annotation and network resources

Background Identifying the biologic functions of groups of genes identified in high-throughput studies currently requires considerable time and/or bioinformatics experience. This is due in part to each resource housed within separate databases, requiring users to know about them, and integrate across them. Time consuming and often repeated for each study, integrating across resources and merging with data under study is an increasingly common bioinformatics task. Methods We developed an open-source R software package for assisting researchers in annotating their genesets with functions, pathways, and their interconnectivity across a diversity of network resources. Results We present rapid integration of term annotation and network resources (RITAN) for the rapid and comprehensive annotation of a list of genes using functional term and pathway resources and their relationships among each other using multiple network biology resources. Currently, and to comply with data redistribution policies, RITAN allows rapid access to 16 term annotations spanning gene ontology, biologic pathways, and immunologic modules, and nine network biology resources, with support for user-supplied resources; we provide recommendations for additional resources and scripts to facilitate their addition to RITAN. Having the resources together in the same system allows users to derive novel combinations. RITAN has a growing set of tools to explore the relationships within resources themselves. These tools allow users to merge resources together such that the merged annotations have a minimal overlap with one another. Because we index both function annotation and network interactions, the combination allows users to expand small groups of genes using links from biologic networks—either by adding all neighboring genes or by identifying genes that efficiently connect among input genes—followed by term enrichment to identify functions. That is, users can start from a core set of genes, identify interacting genes from biologic networks, and then identify the functions to which the expanded list of genes contribute. Conclusion We believe RITAN fills the important niche of bridging the results of high-throughput experiments with the ever-growing corpus of functional annotations and network biology resources. Availability Rapid integration of term annotation and network resources is available as an R package at github.com/MTZimmer/RITAN and BioConductor.org.

Download Full-text

microbialPhenotypes: An R package that analyzes high-throughput microbial phenotype data

10.1101/2020.06.29.177659 ◽

2020 ◽

Author(s):

Peter I-Fan Wu ◽

James C. Hu ◽

Deborah A. Siegele

Keyword(s):

High Throughput ◽

R Package ◽

Computational Tools ◽

Systematic Analysis ◽

Functional Annotations ◽

Large Numbers ◽

Phenotype Data ◽

High Throughput Phenotyping

AbstractVarious microbial high-throughput phenotyping techniques have been vastly conducted to infer functions of genes, generating large numbers of valuable datasets whose potential in providing insights to characterize genes hasn’t been fully exploited. Therefore, computational tools that allow unbiased, systematic analysis of these data also have become vital. Here we describe a package that evaluates high-throughput microbial phenotype data by one or several sets of associated functional annotations are provided. In addition, some helper functions are provided to help clean high-throughput microbial phenotype data.

Download Full-text

kataegis: an R package for identification and visualization of the genomic localized hypermutation regions using high-throughput sequencing

BMC Genomics ◽

10.1186/s12864-021-07696-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xue Lin ◽

Yingying Hua ◽

Shuanglin Gu ◽

Li Lv ◽

Xingyu Li ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Somatic Mutations ◽

R Package ◽

Frequency Of Occurrence ◽

Link Type ◽

Genomic Landscape ◽

One Step ◽

Flanking Regions

Abstract Background Genomic localized hypermutation regions were found in cancers, which were reported to be related to the prognosis of cancers. This genomic localized hypermutation is quite different from the usual somatic mutations in the frequency of occurrence and genomic density. It is like a mutations “violent storm”, which is just what the Greek word “kataegis” means. Results There are needs for a light-weighted and simple-to-use toolkit to identify and visualize the localized hypermutation regions in genome. Thus we developed the R package “kataegis” to meet these needs. The package used only three steps to identify the genomic hypermutation regions, i.e., i) read in the variation files in standard formats; ii) calculate the inter-mutational distances; iii) identify the hypermutation regions with appropriate parameters, and finally one step to visualize the nucleotide contents and spectra of both the foci and flanking regions, and the genomic landscape of these regions. Conclusions The kataegis package is available on Bionconductor/Github (https://github.com/flosalbizziae/kataegis), which provides a light-weighted and simple-to-use toolkit for quickly identifying and visualizing the genomic hypermuation regions.

Download Full-text

Discovering collectively informative descriptors from high-throughput experiments

BMC Bioinformatics ◽

10.1186/1471-2105-10-431 ◽

2009 ◽

Vol 10 (1) ◽

Cited By ~ 3

Author(s):

Clark D Jeffries ◽

William O Ward ◽

Diana O Perkins ◽

Fred A Wright

Keyword(s):

High Throughput ◽

High Throughput Experiments

Download Full-text

SpiderSeqR: an R package for crawling the web of high-throughput multi-omic data repositories for data-sets and annotation

10.1101/2020.04.13.039420 ◽

2020 ◽

Author(s):

Anna M. Sozanska ◽

Charles Fletcher ◽

Dóra Bihary ◽

Shamith A. Samarajiwa

Keyword(s):

High Throughput ◽

R Package ◽

Data Reuse ◽

Massively Parallel ◽

Data Sets ◽

Similar Data ◽

Data Generation ◽

Data Repositories ◽

Public Data ◽

Omic Data

AbstractMore than three decades ago, the microarray revolution brought about high-throughput data generation capability to biology and medicine. Subsequently, the emergence of massively parallel sequencing technologies led to many big-data initiatives such as the human genome project and the encyclopedia of DNA elements (ENCODE) project. These, in combination with cheaper, faster massively parallel DNA sequencing capabilities, have democratised multi-omic (genomic, transcriptomic, translatomic and epigenomic) data generation leading to a data deluge in bio-medicine. While some of these data-sets are trapped in inaccessible silos, the vast majority of these data-sets are stored in public data resources and controlled access data repositories, enabling their wider use (or misuse). Currently, most peer reviewed publications require the deposition of the data-set associated with a study under consideration in one of these public data repositories. However, clunky and difficult to use interfaces, subpar or incomplete annotation prevent discovering, searching and filtering of these multi-omic data and hinder their re-purposing in other use cases. In addition, the proliferation of multitude of different data repositories, with partially redundant storage of similar data are yet another obstacle to their continued usefulness. Similarly, interfaces where annotation is spread across multiple web pages, use of accession identifiers with ambiguous and multiple interpretations and lack of good curation make these data-sets difficult to use. We have produced SpiderSeqR, an R package, whose main features include the integration between NCBI GEO and SRA databases, enabling an integrated unified search of SRA and GEO data-sets and associated annotations, conversion between database accessions, as well as convenient filtering of results and saving past queries for future use. All of the above features aim to promote data reuse to facilitate making new discoveries and maximising the potential of existing data-sets.Availabilityhttps://github.com/ss-lab-cancerunit/SpiderSeqR

Download Full-text

RCy3: Network biology using Cytoscape from within R

F1000Research ◽

10.12688/f1000research.20887.3 ◽

2019 ◽

Vol 8 ◽

pp. 1774 ◽

Cited By ~ 1

Author(s):

Julia A. Gustavsen ◽

Shraddha Pai ◽

Ruth Isserlin ◽

Barry Demchak ◽

Alexander R. Pico

Keyword(s):

Shortest Path ◽

Future Development ◽

Enrichment Analysis ◽

Network Biology ◽

R Package ◽

Programming Environment ◽

R Packages ◽

R Programming ◽

Shortest Path Algorithms ◽

Rest Api

RCy3 is an R package in Bioconductor that communicates with Cytoscape via its REST API, providing access to the full feature set of Cytoscape from within the R programming environment. RCy3 has been redesigned to streamline its usage and future development as part of a broader Cytoscape Automation effort. Over 100 new functions have been added, including dozens of helper functions specifically for intuitive data overlay operations. Over 40 Cytoscape apps have implemented automation support so far, making hundreds of additional operations accessible via RCy3. Two-way conversion with networks from \textit{igraph} and \textit{graph} ensures interoperability with existing network biology workflows and dozens of other Bioconductor packages. These capabilities are demonstrated in a series of use cases involving public databases, enrichment analysis pipelines, shortest path algorithms and more. With RCy3, bioinformaticians will be able to quickly deliver reproducible network biology workflows as integrations of Cytoscape functions, complex custom analyses and other R packages.

Download Full-text

SDN/NFV VNF Service Chaining

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v8i1.75 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Dashmeet Anand, Hariharakumar Narasimhakumar, Et al.

Keyword(s):

Large Scale ◽

Virtual Network ◽

Network Services ◽

Entire System ◽

Performance Parameters ◽

Cloud Environment ◽

Network Resources ◽

Service Chain ◽

Network Functions ◽

Multiple Network

Service Function Chaining (SFC) is a capability that links multiple network functions to deploy end-to-end network services. By virtualizing these network functions also known as Virtual Network Functions (VNFs), the dependency on traditional hardware can be removed, hence making it easier to deploy dynamic service chains over the cloud environment. Before implementing service chains over a large scale, it is necessary to understand the performance overhead created by each VNF owing to their varied characteristics. This research paper attempts to gain insights on the server and networking overhead encountered when a service chain is deployed on a cloud orchestration tool such as OpenStack. Specifically, this research will measure the CPU utilization, RAM usage and System Load of the server hosting OpenStack. Each VNF will be monitored for its varying performance parameters when subjected to different kinds of traffic. Our focus lies on acquiring performance parameters of the entire system for different service chains and compare throughput, latency, and VNF statistics of the virtual network. Insights obtained from this research can be used in the industry to achieve optimum performance of hardware and network resources while deploying service chains.

Download Full-text

INFERRING PROTEIN-PROTEIN INTERACTIONS FROM MESSENGER RNA EXPRESSION PROFILES WITH SVM

Journal of Biological System ◽

10.1142/s0218339005001525 ◽

2005 ◽

Vol 13 (03) ◽

pp. 287-298 ◽

Cited By ~ 1

Author(s):

JUN CAI ◽

YING HUANG ◽

LIANG JI ◽

YANDA LI

Keyword(s):

High Throughput ◽

Protein Interactions ◽

Messenger Rna ◽

Expression Profiles ◽

Support Vector ◽

Svm Classifier ◽

Good Prediction ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

High Throughput Experiments

In post-genomic biology, researchers in the field of proteome focus their attention on the networks of protein interactions that control the lives of cells and organisms. Protein-protein interactions play a useful role in dynamic cellular machinery. In this paper, we developed a method to infer protein-protein interactions based on the theory of support vector machine (SVM). For a given pair of proteins, a new strategy of calculating cross-correlation function of mRNA expression profiles was used to encode SVM vectors. We compared the performance with other methods of inferring protein-protein interaction. Results suggested that, through five-fold cross validation, our SVM model achieved a good prediction. It enables us to show that expression profiles in transcription level can be used to distinguish physical or functional interactions of proteins as well as sequence contents. Lastly, we applied our SVM classifier to evaluate data quality of interaction data sets from four high-throughput experiments. The results show that high-throughput experiments sacrifice some accuracy in determination of interactions because of limitation of experiment technologies.

Download Full-text

bcSeq: an R package for fast sequence mapping in high-throughput shRNA and CRISPR screens

Bioinformatics ◽

10.1093/bioinformatics/bty402 ◽

2018 ◽

Vol 34 (20) ◽

pp. 3581-3583

Author(s):

Jiaxing Lin ◽

Jeremy Gresham ◽

Tongrong Wang ◽

So Young Kim ◽

James Alvarez ◽

...

Keyword(s):

High Throughput ◽

R Package ◽

Sequence Mapping

Download Full-text

Constrained non-negative matrix factorization enabling real-time insights of in situ and high-throughput experiments

Applied Physics Reviews ◽

10.1063/5.0052859 ◽

2021 ◽

Vol 8 (4) ◽

pp. 041410

Author(s):

Phillip M. Maffettone ◽

Aidan C. Daly ◽

Daniel Olds

Keyword(s):

Real Time ◽

High Throughput ◽

Matrix Factorization ◽

High Throughput Experiments ◽

Non Negative Matrix Factorization

Download Full-text

From Online Social Presence to Network Social Presence

Inventive Approaches for Technology Integration and Information Resources Management - Advances in Information Quality and Management ◽

10.4018/978-1-4666-6256-8.ch005 ◽

2014 ◽

pp. 97-112

Author(s):

Chih-Hsiung Tu ◽

Cherng-Jyh Yen ◽

Michael Blocher ◽

Junn-Yih Chan

Keyword(s):

Social Presence ◽

Network Resources ◽

Online Learners ◽

Four Dimensions ◽

Personal Learning ◽

Open Network ◽

Network Connection ◽

Linkage Design ◽

Multiple Network ◽

Predictive Relationship

Open Network Learning Environment (ONLE) empowers network learners to create, edit, and share their knowledge via social network connection. This chapter examines the predictive relationship between social presence and ONLE interaction and scrutinizes the relationships between social presence and four dimensions of ONLE's interaction. The chapter concludes that online social presence could not serve as a predictor for all four open network learning's interactions. The results suggest both online and ONLE social presences have distinguishing dynamics in social interaction. ONLE focuses on “social” and “networking” linkages to transform online learners into “network learners” to project their preferred “network social presence” rather than online social presence. This chapter proposes the Open network linkage design model, which is a “Linkage Architecture” that links multiple network resources, network learners, and Web 2.0 tools to allow learners, instructors, and other stakeholders to construct and to share their Personal Learning Environments within the human network.

Download Full-text