Practical R for biologists: an introduction

Base Functions ◽

Almost All ◽

Selection Of ◽

R Functions

Abstract R is an open-source statistical environment modelled after the previously widely used commercial programs S and S-Plus, but in addition to powerful statistical analysis tools, it also provides powerful graphics outputs. In addition to its statistical and graphical capabilities, R is a programming language suitable for medium-sized projects. This book presents a set of studies that collectively represent almost all the R operations that beginners, analysing their own data up to perhaps the early years of doing a PhD, need. Although the chapters are organized around topics such as graphing, classical statistical tests, statistical modelling, mapping and text parsing, examples have been chosen based largely on real scientific studies at the appropriate level and within each the use of more R functions is nearly always covered than are simply necessary just to get a p-value or a graph. R comes with around a thousand base functions which are automatically installed when R is downloaded. This book covers the use of those of most relevance to biological data analysis, modelling and graphics. Throughout each chapter, the functions introduced and used in that chapter are summarized in Tool Boxes. The book also shows the user how to adapt and write their own code and functions. A selection of base functions relevant to graphics that are not necessarily covered in the main text are described in Appendix 1, and additional housekeeping functions in Appendix 2.

BioInstaller: a comprehensive R package to construct interactive and reproducible biological data analysis applications based on the R platform

PeerJ ◽

10.7717/peerj.5853 ◽

2018 ◽

Vol 6 ◽

pp. e5853 ◽

Cited By ~ 1

Author(s):

Jianfeng Li ◽

Bowen Cui ◽

Yuting Dai ◽

Ling Bai ◽

Jinyan Huang

Keyword(s):

Data Analysis ◽

R Package ◽

Biological Data ◽

Representational State Transfer ◽

State Transfer ◽

Application Programming ◽

Programming Interfaces ◽

Shiny Application ◽

R Functions

The increase in bioinformatics resources such as tools/scripts and databases poses a great challenge for users seeking to construct interactive and reproducible biological data analysis applications. Here, we propose an open-source, comprehensive, flexible R package named BioInstaller that consists of the R functions, Shiny application, the HTTP representational state transfer application programming interfaces, and a docker image. BioInstaller can be used to collect, manage and share various types of bioinformatics resources and perform interactive and reproducible data analyses based on the extendible Shiny application with Tom’s Obvious, Minimal Language and SQLite format databases. The source code of BioInstaller is freely available at our lab website, http://bioinfo.rjh.com.cn/labs/jhuang/tools/bioinstaller, the popular package host GitHub, https://github.com/JhuangLab/BioInstaller, and the Comprehensive R Archive Network, https://CRAN.R-project.org/package=BioInstaller. In addition, a docker image can be downloaded from DockerHub (https://hub.docker.com/r/bioinstaller/bioinstaller).

Preventive maintenance scheduling using analysis of variance-based ant lion optimizer

World Journal of Engineering ◽

10.1108/wje-06-2017-0145 ◽

2018 ◽

Vol 15 (2) ◽

pp. 254-272 ◽

Cited By ~ 2

Author(s):

Umamaheswari Elango ◽

Ganesan Sivarajan ◽

Abirami Manoharan ◽

Subramanian Srikrishna

Keyword(s):

Analysis Of Variance ◽

Heuristic Algorithm ◽

Heuristic Algorithms ◽

Statistical Tests ◽

Population Based ◽

Maintenance Scheduling ◽

Content Type ◽

Ant Lion Optimizer ◽

Ant Lion ◽

Purpose Generator maintenance scheduling (GMS) is an essential task for electric power utilities as the periodical maintenance activity enhances the lifetime and also ensures the reliable and continuous operation of generating units. Though numerous meta-heuristic algorithms have been reported for the GMS solution, enhancing the existing techniques or developing new optimization procedure is still an interesting research task. The meta-heuristic algorithms are population based and the selection of their algorithmic parameters influences the quality of the solution. This paper aims to propose statistical tests guided meta-heuristic algorithm for solving the GMS problems. Design/methodology/approach The intricacy characteristics of the GMS problem in power systems necessitate an efficient and robust optimization tool. Though several meta-heuristic algorithms have been applied to solve the chosen power system operational problem, tuning of their control parameters is a protracting process. To prevail over the previously mentioned drawback, the modern meta-heuristic algorithm, namely, ant lion optimizer (ALO), is chosen as the optimization tool for solving the GMS problem. Findings The meta-heuristic algorithms are population based and require proper selection of algorithmic parameters. In this work, the ANOVA (analysis of variance) tool is proposed for selecting the most feasible decisive parameters in algorithm domain, and the statistical tests-based validation of solution quality is described. The parametric and non-parametric statistical tests are also performed to validate the selection of ALO against the various competing algorithms. The numerical and statistical results confirm that ALO is a promising tool for solving the GMS problems. Originality/value As a first attempt, ALO is applied to solve the GMS problem. Moreover, the ANOVA-based parameter selection is proposed and the statistical tests such as Wilcoxon signed rank and one-way ANOVA are conducted to validate the applicability of the intended optimization tool. The contribution of the paper can be summarized in two folds: the ANOVA-based ALO for GMS applications and statistical tests-based performance evaluation of intended algorithm.

Practical biomedical statistics: A guide to the selection of statistical tests

Urology ◽

10.1016/s0090-4295(99)80374-4 ◽

1996 ◽

Vol 47 (1) ◽

pp. 2-13 ◽

Cited By ~ 5

Author(s):

J. Stuart Wolf ◽

Deborah S. Smith

Keyword(s):

Statistical Tests ◽

Biomedical Statistics ◽

Clinical Characterisation and Management of the Main Treatment-Induced Toxicities in Patients with Hepatocellular Carcinoma and Cirrhosis

Cancers ◽

10.3390/cancers13030584 ◽

2021 ◽

Vol 13 (3) ◽

pp. 584

Author(s):

Fausto Meriggi ◽

Massimo Graffeo

Keyword(s):

Hepatocellular Carcinoma ◽

Hepatic Impairment ◽

Hepatic Cirrhosis ◽

Disease Stage ◽

The Past ◽

Systemic Treatments ◽

Key Points ◽

Locoregional Therapies ◽

Almost All ◽

The incidence of hepatocellular carcinoma (HCC) continues to increase worldwide, particularly in Western countries. In almost all cases, HCC develops in subjects with hepatic cirrhosis, often as the result of hepatitis B or C virus infection, alcohol abuse or metabolic forms secondary to non-alcoholic steatohepatitis. Patients with HCC and hepatic symptoms can therefore present symptoms that are attributable to both conditions. These patients require multidisciplinary management, calling for close interaction between the hepatologist and the oncologist. Indeed, the treatment of HCC requires, depending on the disease stage and the degree of hepatic impairment, locoregional therapies that can in turn be broken down into surgical and nonsurgical treatments and systemic treatments used in the event of progression after the administration of locoregional treatments. The past decade has seen the publication of countless papers of great interest that have radically changed the scenario of treatment for HCC. Novel therapies with biological agents and immunotherapy have come to be standard options in the approach to treatment of this cancer, obtaining very promising results where in the past chemotherapy was almost never able to have an impact on the course of the disease. However, in addition to being costly, these drugs are not devoid of adverse effects and their management cannot forgo the consideration of the underlying hepatic impairment. Patients with HCC and cirrhosis therefore require special attention, starting from the initial characterisation needed for an appropriate selection of those to be referred for treatment, as these patients are almost never fit. In this chapter, we will attempt to investigate and clarify the key points of the management of the main toxicities induced by locoregional and systemic treatments for HCC secondary to cirrhosis.

Advances in Intelligent Systems and Computing - Information Technology, Systems Research, and Computational Physics ◽

Graph Cutting in Image Processing Handling with Biological Data Analysis

10.1007/978-3-030-18058-4_16 ◽

2019 ◽

pp. 203-216

Author(s):

Mária Ždímalová ◽

Tomáš Bohumel ◽

Katarína Plachá-Gregorovská ◽

Peter Weismann ◽

Hisham El Falougy

Keyword(s):

Image Processing ◽

Data Analysis ◽

Biological Data ◽

Biological Data Analysis

P nucleotides in V(D)J recombination: a fine-structure analysis.

Molecular and Cellular Biology ◽

10.1128/mcb.13.2.1078 ◽

1993 ◽

Vol 13 (2) ◽

pp. 1078-1092 ◽

Cited By ~ 51

Author(s):

J T Meier ◽

S M Lewis

Keyword(s):

Fine Structure ◽

Germ Line ◽

Statistical Tests ◽

Random Sequence ◽

Genetic Locus ◽

Gene Segment ◽

Lymphoid Cells ◽

Base Pairs ◽

Nucleotide Data ◽

Antigen receptor genes acquire junctional inserts upon assembly from their component, germ line-encoded V, D, and J segments. Inserts are generally of random sequence, but a small number of V-D, D-J, or V-J junctions are exceptional. In such junctions, one or two added base pairs inversely repeat the sequence of the abutting germ line DNA. (For example, a gene segment ending AG might acquire an insert beginning with the residues CT upon joining). It has been proposed that the nonrandom residues, termed "P nucleotides," are a consequence of an obligatory end-modification step in V(D)J recombination. P insertion in normal, unselected V(D)J joining products, however, has not been rigorously established. Here, we use an experimentally manipulable system, isolated from immune selection of any kind, to examine the fine structure of V(D)J junctions formed in wild-type lymphoid cells. Our results, according to statistical tests, show the following, (i) The frequency of P insertion is influenced by the DNA sequence of the joined ends. (ii) P inserts may be longer than two residues in length. (iii) P inserts are associated with coding ends only. Additionally, a systematic survey of published P nucleotide data shows no evidence for variation in P insertion as a function of genetic locus and ontogeny. Together, these analyses establish the generality of the P nucleotide pattern within inserts but do not fully support previous conjectures as to their origin and centrality in the joining reaction.

Dementia with Parkinson's disease: Clinical diagnosis, neuropsychological aspects and treatment

Dementia & Neuropsychologia ◽

10.1590/s1980-57642009dn20400005 ◽

2008 ◽

Vol 2 (4) ◽

pp. 261-266

Author(s):

Jorge Lorenzo Otero

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Lewy Body ◽

Task Force ◽

Lewy Body Dementia ◽

Data Bases ◽

Main Text ◽

Clinical Diagnoses ◽

Movement Disorder Society ◽

Abstract Dementia with Parkinson's disease represents a controversial issue in the complex group of alpha-synucleinopathies. The author acknowledges the concept of a "continuum" between Parkinson disease's (PD), Lewy body dementia (LBD), and dementia in Parkinson's disease (PDD). However, the practicing neurologist needs to identify the phenotypic signs of each dementia. The treatment and prognosis are different in spite of the overlaps between them. The main aim of this review was to characterize the clinical diagnoses of dementia associated with Parkinson's disease (PDD). Secondarily, the review discussed some epidemiological and neuropsychological issues. Selection of articles was not systematic and reflects the author's opinion, where the main text selected was the recommendations from the Movement Disorder Society Task Force for PDD diagnosis. The Pub Med, OVID, and Proquest data bases were used for the search.

Computational Intelligence Methods for Bioinformatics and Biostatistics - Lecture Notes in Computer Science ◽

Data-Intensive Computing Infrastructure Systems for Unmodified Biological Data Analysis Pipelines

10.1007/978-3-319-24462-4_22 ◽

2015 ◽

pp. 259-272 ◽

Cited By ~ 1

Author(s):

Lars Ailo Bongo ◽

Edvard Pedersen ◽

Martin Ernstsen

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Intensive Computing ◽

Infrastructure Systems ◽

Data Intensive ◽

Computing Infrastructure

Data Size Requirement for Forecasting Daily Crude Oil Price with Neural Networks

Scientific Annals of Economics and Business ◽

10.47743/saeb-2019-0027 ◽

2019 ◽

Vol 66 (3) ◽

pp. 363-388

Author(s):

Serkan Aras ◽

Manel Hamdi

Keyword(s):

Neural Networks ◽

Crude Oil ◽

Significant Interaction ◽

Statistical Tests ◽

Oil Price ◽

Training Data ◽

Forecasting Model ◽

Crude Oil Price ◽

Crude Oil Prices ◽

When the literature regarding applications of neural networks is investigated, it appears that a substantial issue is what size the training data should be when modelling a time series through neural networks. The aim of this paper is to determine the size of training data to be used to construct a forecasting model via a multiple-breakpoint test and compare its performance with two general methods, namely, using all available data and using just two years of data. Furthermore, the importance of the selection of the final neural network model is investigated in detail. The results obtained from daily crude oil prices indicate that the data from the last structural change lead to simpler architectures of neural networks and have an advantage in reaching more accurate forecasts in terms of MAE value. In addition, the statistical tests show that there is a statistically significant interaction between data size and stopping rule.

Molecular Inverse Comorbidity between Alzheimer’s disease and Lung Cancer: new insights from Matrix Factorization

10.1101/643890 ◽

2019 ◽

Author(s):

Alessandro Greco ◽

Jon Sanchez Valle ◽

Vera Pancaldi ◽

Anaïs Baudot ◽

Emmanuel Barillot ◽

...

Keyword(s):

Lung Cancer ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Matrix Factorization ◽

Large Scale ◽

Molecular Mechanisms ◽

Biological Data ◽

Specific Factors ◽

Molecular Bases

AbstractMatrix Factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology.We here challenge MF in depicting the molecular bases of epidemiologically described Disease-Disease (DD) relationships. As use case, we focus on the inverse comorbidity association between Alzheimer’s disease (AD) and lung cancer (LC), described as a lower than expected probability of developing LC in AD patients. To the day, the molecular mechanisms underlying DD relationships remain poorly explained and their better characterization might offer unprecedented clinical opportunities.To this goal, we extend our previously designed MF-based framework for the molecular characterization of DD relationships. Considering AD-LC inverse comorbidity as a case study, we highlight multiple molecular mechanisms, among which the previously identified immune system and mitochondrial metabolism. We then discriminate mechanisms specific to LC from those shared with other cancers through a pancancer analysis. Additionally, new candidate molecular players, such as Estrogen Receptor (ER), CDH1 and HDAC, are pinpointed as factors that might underlie the inverse relationship, opening the way to new investigations. Finally, some lung cancer subtype-specific factors are also detected, suggesting the existence of heterogeneity across patients also in the context of inverse comorbidity.