log transformation
Recently Published Documents


TOTAL DOCUMENTS

98
(FIVE YEARS 25)

H-INDEX

21
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Lauren L Hsu ◽  
Aedin C Culhane

Effective dimension reduction is an essential step in analysis of single cell RNA-seq(scRNAseq) count data, which are high-dimensional, sparse, and noisy. Principal component analysis (PCA) is widely used in analytical pipelines, and since PCA requires continuous data, it is often coupled with log-transformation in scRNAseq applications. However, log-transformation of scRNAseq counts distorts the data, and can obscure meaningful variation. We describe correspondence analysis (CA) for dimension reduction of scRNAseq data, which is a performant alternative to PCA.Designed for use with counts, CA is based on decomposition of a chi-squared residual matrix and does not require log-transformation of scRNAseq counts. We extend beyond standard CA (decomposition of Pearson residuals computed on the contingency table) and propose variations of CA, including an alternative chi-squared statistic, that address overdispersion and high sparsity in scRNAseq data. The performance of five variations of CA and standard CA are benchmarked on 10 datasets and compared to glmPCA. CA variations are fast, scalable, and outperforms standard CA and glmPCA, to compute embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. Of the variations we considered,CA using the Freeman-Tukey chi-squared residual was most performant overall in scRNAseq data. Our analyses also showed that variance stabilizing transformations applied in conjunction with standard CA (using Pearson residuals) and the use of power deflation smoothing both improve performance in downstream clustering tasks, as compared to standard CA alone. CA has advantages including visual illustration of associations between genes and cell populations in a 'CA biplot' and easy extension to multi-table analysis enabling integrative dimension reduction. We introduce corralm, a CA-based method for multi-table batch integration of scRNAseq data in shared latent space, and we propose a new approach for assessing batch integration. We implement CA for scRNAseq in the corral R/Bioconductor package(https://www.bioconductor.org/packages/corral) that interfaces directly with widely used single cell classes in Bioconductor, allowing for easy integration into scRNAseq pipelines.


Author(s):  
Robert M West

The log transformation is often used to reduce skewness of a measurement variable. If, after transformation, the distribution is symmetric, then the Welch t-test might be used to compare groups. If, also, the distribution becomes close to normal, then a reference interval might be determined.


2021 ◽  
Author(s):  
Johannes Bracher ◽  
Jonas M. Littek

The moving epidemic method (MEM) and the WHO method are widely used to determine intensity levels for seasonal influenza. The two approaches are conceptually similar, but differ in two aspects. Firstly, the MEM involves a log transformation of incidence data, while the WHO method operates on the original scale. Secondly, the MEM uses more than one observation from each past season to compute intensity thresholds, fixing the total number to include. The WHO method uses only the highest value from each season. To assess the impact of these choices on thresholds we perform simulation studies which are based on re-sampling of ILI data from France, Spain, Switzerland and the US. When no transformation is applied, a rather large proportion of season peaks are classified as high or very high intensity. This can be mitigated by a logarithmic transformation. When fixing the total number of included past observations, thresholds increase the more seasons are available. When only few are available, there is a high chance of classifying new season peaks as high or very high intensity. We therefore suggest using one observation per season and a log transformation, i.e. a hybrid of the default settings of the MEM and WHO methods.


2021 ◽  
Vol 224 (11) ◽  
Author(s):  
Douglas S. Glazier

ABSTRACT The magnitude of many biological traits relates strongly and regularly to body size. Consequently, a major goal of comparative biology is to understand and apply these ‘size-scaling’ relationships, traditionally quantified by using linear regression analyses based on log-transformed data. However, recently some investigators have questioned this traditional method, arguing that linear or non-linear regression based on untransformed arithmetic data may provide better statistical fits than log-linear analyses. Furthermore, they advocate the replacement of the traditional method by alternative specific methods on a case-by-case basis, based simply on best-fit criteria. Here, I argue that the use of logarithms in scaling analyses presents multiple valuable advantages, both statistical and conceptual. Most importantly, log-transformation allows biologically meaningful, properly scaled (scale-independent) comparisons of organisms of different size, whereas non-scaled (scale-dependent) analyses based on untransformed arithmetic data do not. Additionally, log-based analyses can readily reveal biologically and theoretically relevant discontinuities in scale invariance during developmental or evolutionary increases in body size that are not shown by linear or non-linear arithmetic analyses. In this way, log-transformation advances our understanding of biological scaling conceptually, not just statistically. I hope that my Commentary helps students, non-specialists and other interested readers to understand the general benefits of using log-transformed data in size-scaling analyses, and stimulates advocates of arithmetic analyses to show how they may improve our understanding of scaling conceptually, not just statistically.


2021 ◽  
pp. 63-69
Author(s):  
Atica M. Altaie ◽  
Asmaa Yaseen Hamo ◽  
Rasha Gh. Alsarraj

A fault is an error that has effects on system behaviour. A software metric is a value that represents the degree to which software processes work properly and where faults are more probable to occur. In this research, we study the effects of removing redundancy and log transformation based on threshold values for identifying faults-prone classes of software. The study also contains a comparison of the metric values of an original dataset with those after removing redundancy and log transformation. E-learning and system dataset were taken as case studies. The fault ratio ranged from 1%-31% and 0%-10% for the original dataset and 1%-10% and 0%-4% after removing redundancy and log transformation, respectively. These results impacted directly the number of classes detected, which ranged between 1-20 and 1-7 for the original dataset and 1-7 and 0-3) after removing redundancy and log transformation. The Skewness of the dataset was deceased after applying the proposed model. The classified faulty classes need more attention in the next versions in order to reduce the ratio of faults or to do refactoring to increase the quality and performance of the current version of the software.


2020 ◽  
Vol 4 (2) ◽  
pp. 136-144
Author(s):  
Faisal Dharma Adhinata ◽  
Ariq Cahya Wardhana ◽  
Diovianto Putra Rakhmadani ◽  
Akhmad Jayadi

Salah satu tahap utama dalam pemrosesan citra digital adalah peningkatan kualitas citra. Citra yang berwarna gelap tidak terlihat detail informasi yang terkandung pada citra. Bahkan objek yang tampak pada citra bisa tidak terlihat karena pengambilan citra dilakukan pada pencahayaan kurang. Citra gelap perlu dilakukan peningkatan kualitas citra supaya detail informasi citra dapat terlihat secara visual. Beberapa algoritma peningkatan kualitas citra digital diantaranya negative transformation, log transformation, contrast stretching, bit plane slice, dan histogram equalization. Pada penelitian ini akan dikaji beberapa algoritma peningkatan kualitas citra untuk melihat hasil terbaik dari kasus citra gelap. Berdasarkan hasil percobaan, diperoleh hasil terbaik menggunakan algoritma histogram equalization. Algoritma histogram equalization menghasilkan histogram citra yang tersebar rata sehingga detail informasi citra dapat dilihat secara visual.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 249-249
Author(s):  
Tae Jeong Choi ◽  
Byoungho Park ◽  
Hannah Oh ◽  
Oh Sang-Hyon

Abstract This study is to evaluate the effect of log transformation on ultrasound data records on the genetic evaluation of Hanwoo (Korean native cattle) proven bulls. Three different scenarios were established: genetic evaluations using (A) only yearling weights (YW; n = 15,665) and carcass traits (CT); (B) YW, CT and raw ultrasound data at 12 and 24 months of age; and (C) YW, CT and log transformed data of raw ultrasound data records in (B). Carcass traits include carcass weight (CW; n = 6,526), loin muscle area (LMA; n = 6,820), backfat thickness (BFT; n = 6,723), and intramuscular fat (IMF; n = 5,037). Ultrasound traits include LMA, BFT, and fat content (FC; %), which were measured with Aquila Vet (Pie Medical) between the 12th and 13th rib. REMLF90 was used to estimate genetic parameters such as heritability and genetic correlations. Rank correlations of breeding values were analyzed using SAS 9.2. The heritability of YW was the highest at (B) and reduced by 0.2% at (C). The heritability of CW was higher at (B) and (C) than at (A) by 0.6%. In the case of BFT and IMF, (B) and (C) was 0.5% and 0.6% higher than (A) and -1.0% and -0.8% lower than (A), respectively. The heritabilities regarding ultrasound traits at 12 months of age were higher for LMA and FC in (C) than (B), and higher for BFT in (B) than (C). In the case of 24 months of age, (C) was higher in all ultrasound traits than (B). The rank correlations of (B) and (C) at 12 months were 0.995 in ultrasound LMA and 0.991 in ultrasound BFT. For 24 months, the correlations were 0.989 and 0.986, respectively. In conclusion, the results didn’t provide us with evidence of a significant difference in genetic evaluations among different scenarios regarding log transformation on ultrasound data records of carcass traits.


2020 ◽  
Vol 10 (18) ◽  
pp. 6247
Author(s):  
Hanan M. Hammouri ◽  
Roy T. Sabo ◽  
Rasha Alsaadawi ◽  
Khalid A. Kheirallah

Scientists in biomedical and psychosocial research need to deal with skewed data all the time. In the case of comparing means from two groups, the log transformation is commonly used as a traditional technique to normalize skewed data before utilizing the two-group t-test. An alternative method that does not assume normality is the generalized linear model (GLM) combined with an appropriate link function. In this work, the two techniques are compared using Monte Carlo simulations; each consists of many iterations that simulate two groups of skewed data for three different sampling distributions: gamma, exponential, and beta. Afterward, both methods are compared regarding Type I error rates, power rates and the estimates of the mean differences. We conclude that the t-test with log transformation had superior performance over the GLM method for any data that are not normal and follow beta or gamma distributions. Alternatively, for exponentially distributed data, the GLM method had superior performance over the t-test with log transformation.


2020 ◽  
Vol 29 (12) ◽  
pp. 3547-3568
Author(s):  
Shi-Fang Qiu ◽  
Qi-Xiang Fu

This article investigates the homogeneity testing problem of binomial proportions for stratified partially validated data obtained by double-sampling method with two fallible classifiers. Several test procedures, including the weighted-least-squares test with/without log-transformation, logit-transformation and double log-transformation, and likelihood ratio test and score test, are developed to test the homogeneity under two models, distinguished by conditional independence assumption of two classifiers. Simulation results show that score test performs better than other tests in the sense that the empirical size is generally controlled around the nominal level, and hence be recommended to practical applications. Other tests also perform well when both binomial proportions and sample sizes are not small. Approximate sample sizes based on score test, likelihood ratio test and the weighted-least-squares test with double log-transformation are generally accurate in terms of the empirical power and type I error rate with the estimated sample sizes, and hence be recommended. An example from the malaria study is illustrated by the proposed methodologies.


Sign in / Sign up

Export Citation Format

Share Document