data transformations
Recently Published Documents


TOTAL DOCUMENTS

279
(FIVE YEARS 67)

H-INDEX

28
(FIVE YEARS 3)

2021 ◽  
Author(s):  
André Marquardt ◽  
Philip Kollmannsberger ◽  
Markus Krebs ◽  
Markus Knott ◽  
Antonio Giovanni Solimando ◽  
...  

1.AbstractPersonalized Oncology is a rapidly evolving area and offers cancer patients therapy options more specific than ever. Yet, there is still a lack of understanding regarding transcriptomic similarities or differences of metastases and corresponding primary sites. Approaching this question, we used two different unsupervised dimension reduction methods – t-SNE and UMAP – on three different metastases datasets – prostate cancer, neuroendocrine prostate cancer, and skin cutaneous melanoma – including 682 different samples, with three different underlying data transformations – unprocessed FPKM values, log10 transformed FPKM values, and log10+1 transformed FPKM values – to visualize potential underlying clusters. The approaches resulted in formation of different clusters that were independent of respective resection sites. Additionally, data transformation critically affected cluster formation in most cases. Of note, our study revealed no tight link between the metastasis resection site and specific transcriptomic features. Instead, our analysis demonstrates the dependency of cluster formation on the underlying data transformation and the dimension reduction method applied. These observations propose data transformation as another key element in the interpretation of visual clustering approaches apart from well-known determinants such as initialization and parameters. Furthermore, the results show the need for further evaluation of underlying data alterations based on the biological question and subsequently used methods and applications.


Author(s):  
Mieczysław A. Kłopotek ◽  
Robert A. Kłopotek

AbstractKleinberg introduced an axiomatic system for clustering functions. Out of three axioms, he proposed, two (scale invariance and consistency) are concerned with data transformations that should produce the same clustering under the same clustering function. The so-called consistency axiom provides the broadest range of transformations of the data set. Kleinberg claims that one of the most popular clustering algorithms, k-means does not have the property of consistency. We challenge this claim by pointing at invalid assumptions of his proof (infinite dimensionality) and show that in one dimension in Euclidean space the k-means algorithm has the consistency property. We also prove that in higher dimensional space, k-means is, in fact, inconsistent. This result is of practical importance when choosing testbeds for implementation of clustering algorithms while it tells under which circumstances clustering after consistency transformation shall return the same clusters. Two types of remedy are proposed: gravitational consistency property and dataset consistency property which both hold for k-means and hence are suitable when developing the mentioned testbeds.


2021 ◽  
Vol 2052 (1) ◽  
pp. 012058
Author(s):  
T V Zhgun

Abstract The features of the data distribution can significantly affect the composite characteristics of objects, so composite indexes of objects must necessarily take into account the features of the data. Some types of data are characterized by distributions with a significant anomaly, when the vast majority of observations are concentrated near the boundary values. This type of data cannot always be characterized by an asymmetry coefficient. In addition, if the values of a variable are approximately symmetric with respect to zero or are concentrated near zero, the sample cannot also be characterized by the coefficient of variation. The paper proposes a transformation that allows us to identify the anomalous nature of variables using the signal-to-noise ratio. Variables are evaluated in the standard range, which is shifted to the right relative to zero. If it is necessary to logarithm, such a transformation will avoid the pressure of small values of variables that, after direct logarithm, would have large negative values. The application of logarithmic correction for the detected anomalous variables redistributes the values of the obtained weighting coefficients in the direction of a more correct interpretation and, in particular, solves the problem with the negativity of the weighting coefficients.


2021 ◽  
Author(s):  
Hadi Hojjati ◽  
Narges Armanfard

We propose an acoustic anomaly detection algorithm based on the framework of contrastive learning. Contrastive learning is a recently proposed self-supervised approach that has shown promising results in image classification and speech recognition. However, its application in anomaly detection is underexplored. Earlier studies have demonstrated that it can achieve state-of-the-art performance in image anomaly detection, but its capability in anomalous sound detection is yet to be investigated. For the first time, we propose a contrastive learning-based framework that is suitable for acoustic anomaly detection. Since most existing contrastive learning approaches are targeted toward images, the effect of other data transformations on the performance of the algorithm is unknown. Our framework learns a representation from unlabeled data by applying audio-specific data augmentations. We show that in the resulting latent space, normal and abnormal points are distinguishable. Experiments conducted on the MIMII dataset confirm that our approach can outperform competing methods in detecting anomalies.


2021 ◽  
Author(s):  
Hadi Hojjati

We propose an acoustic anomaly detection algorithm based on the framework of contrastive learning. Contrastive learning is a recently proposed self-supervised approach that has shown promising results in image classification and speech recognition. However, its application in anomaly detection is underexplored. Earlier studies have demonstrated that it can achieve state-of-the-art performance in image anomaly detection, but its capability in anomalous sound detection is yet to be investigated. For the first time, we propose a contrastive learning-based framework that is suitable for acoustic anomaly detection. Since most existing contrastive learning approaches are targeted toward images, the effect of other data transformations on the performance of the algorithm is unknown. Our framework learns a representation from unlabeled data by applying audio-specific data augmentations. We show that in the resulting latent space, normal and abnormal points are distinguishable. Experiments conducted on the MIMII dataset confirm that our approach can outperform competing methods in detecting anomalies.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Christopher T. Lee ◽  
Manolis Maragkakis

Abstract Background The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations. Results Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software. Conclusions SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/.


2021 ◽  
pp. 44-47
Author(s):  
A. V. Prutskov

Industrial information, measuring, and controlling systems have a program module designed to convert measurement results into data for display and control signals. A program module interconnected to other modules by program interfaces. In this case, data transformations are necessary when they are sent between modules. Object-oriented design patterns can be used when programming data transformations. When converting measurement results to objects, the Adapter design pattern can be applied. Programming the Command pattern is intended for converting objects into control signals. Data processing should be separated from their representation, storage and transmission. Functions between modules can be divided using the Model–View–Controller pattern. The use of design patterns reduces the development time and subsequent modification of software for both information, measuring, and control systems, as well as systems in other subjects of science and economics.


Author(s):  
Nguyen Van Thien ◽  
Do Duc Trung

This article presents empirical study results when milling SCM440 steel. The cutting insert to be used was a TiN coated cutting insert with tool tip radius of 0.5 mm. Experimental process was carried out with 18 experiments according to Box-Behnken matrix, in which cutting speed, feed rate and cutting depth were selected as the input parameters of each experiment. In addition, cutting force was selected as the output parameter. Analysis of experimental results has determined the influence of the input parameters as well as the interaction between them on the output parameters. From the experimental results, a regression model showing the relationship between cutting force and input parameters was built. Box-Cox and Johnson data transformations were applied to construct two other models of cutting force. These three regression models were used to predict cutting force and compare with experimental results. Using parameters including coefficient of determination (R-Sq), adjusted coefficient of determination (R-Sq(adj)) and percentage mean absolute error (% MAE) between the results predicted by the models and the experimental results are the criteria to compare the accuracy of the cutting force models. The results have determined that the two models using two data transformations have higher accuracy than model not using two data transformations. A comparison of the model using the Box-Cox transformation and the model using the Johnson transformation was made with a t-test. The results confirmed that these two models have equal accuracy. Finally, the development direction for the next study is mentioned in this article


2021 ◽  
Vol 13 (17) ◽  
pp. 3466
Author(s):  
Gustavo de Araújo Carvalho ◽  
Peter J. Minnett ◽  
Nelson F. F. Ebecken ◽  
Luiz Landau

Linear discriminant analysis (LDA) is a mathematically robust multivariate data analysis approach that is sometimes used for surface oil slick signature classification. Our goal is to rank the effectiveness of LDAs to differentiate oil spills from look-alike slicks. We explored multiple combinations of (i) variables (size information, Meteorological-Oceanographic (metoc), geo-location parameters) and (ii) data transformations (non-transformed, cube root, log10). Active and passive satellite-based measurements of RADARSAT, QuikSCAT, AVHRR, SeaWiFS, and MODIS were used. Results from two experiments are reported and discussed: (i) an investigation of 60 combinations of several attributes subjected to the same data transformation and (ii) a survey of 54 other data combinations of three selected variables subjected to different data transformations. In Experiment 1, the best discrimination was reached using ten cube-transformed attributes: ~85% overall accuracy using six pieces of size information, three metoc variables, and one geo-location parameter. In Experiment 2, two combinations of three variables tied as the most effective: ~81% of overall accuracy using area (log transformed), length-to-width ratio (log- or cube-transformed), and number of feature parts (non-transformed). After verifying the classification accuracy of 114 algorithms by comparing with expert interpretations, we concluded that applying different data transformations and accounting for metoc and geo-location attributes optimizes the accuracies of binary classifiers (oil spill vs. look-alike slicks) using the simple LDA technique.


2021 ◽  
pp. 195-208
Author(s):  
Andy Hector

This chapter revisits a regression analysis to explore the normal least squares assumption of approximately equal variance. It also considers some of the data transformations that can be used to achieve this. A linear regression of transformed data is compared with a generalized linear-model equivalent that avoids transformation by using a link function and non-normal distributions. Generalized linear models based on maximum likelihood use a link function to model the mean (in this case a square-root link) and a variance function to model the variability (in this case the gamma distribution, where the variance increases as the square of the mean). The Box–Cox family of transformations is explained in detail.


Sign in / Sign up

Export Citation Format

Share Document