data transformation
Recently Published Documents


TOTAL DOCUMENTS

547
(FIVE YEARS 147)

H-INDEX

26
(FIVE YEARS 5)

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Huijian Feng ◽  
Lihui Lin ◽  
Jiekai Chen

Abstract Background Single-cell RNA sequencing is becoming a powerful tool to identify cell states, reconstruct developmental trajectories, and deconvolute spatial expression. The rapid development of computational methods promotes the insight of heterogeneous single-cell data. An increasing number of tools have been provided for biological analysts, of which two programming languages- R and Python are widely used among researchers. R and Python are complementary, as many methods are implemented specifically in R or Python. However, the different platforms immediately caused the data sharing and transformation problem, especially for Scanpy, Seurat, and SingleCellExperiemnt. Currently, there is no efficient and user-friendly software to perform data transformation of single-cell omics between platforms, which makes users spend unbearable time on data Input and Output (IO), significantly reducing the efficiency of data analysis. Results We developed scDIOR for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). We have created a data IO ecosystem between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy). Importantly, scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface. For large scale datasets, users can partially load the needed information, e.g., cell annotation without the gene expression matrices. scDIOR connects the analytical tasks of different platforms, which makes it easy to compare the performance of algorithms between them. Conclusions scDIOR contains two modules, dior in R and diopy in Python. scDIOR is a versatile and user-friendly tool that implements single-cell data transformation between R and Python rapidly and stably. The software is freely accessible at https://github.com/JiekaiLab/scDIOR.


2022 ◽  
Vol 71 (2) ◽  
pp. 2191-2207
Author(s):  
Iqra Afzal ◽  
Fiaz Majeed ◽  
Muhammad Usman Ali ◽  
Shahzada Khurram ◽  
Akber Abid Gardezi ◽  
...  

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 192
Author(s):  
Meirong Wei ◽  
Yan Liu ◽  
Tao Zhang ◽  
Ze Wang ◽  
Jiaming Zhu

Convolution neural network (CNN)-based fault diagnosis methods have been widely adopted to obtain representative features and used to classify fault modes due to their prominent feature extraction capability. However, a large number of labeled samples are required to support the algorithm of CNNs, and, in the case of a limited amount of labeled samples, this may lead to overfitting. In this article, a novel ResNet-based method is developed to achieve fault diagnoses for machines with very few samples. To be specific, data transformation combinations (DTCs) are designed based on mutual information. It is worth noting that the selected DTC, which can complete the training process of the 1-D ResNet quickly without increasing the amount of training data, can be randomly used for any batch training data. Meanwhile, a self-supervised learning method called 1-D SimCLR is adopted to obtain an effective feature encoder, which can be optimized with very few unlabeled samples. Then, a fault diagnosis model named DTC-SimCLR is constructed by combining the selected data transformation combination, the obtained feature encoder and a fully-connected layer-based classifier. In DTC-SimCLR, the parameters of the feature encoder are fixed, and the classifier is trained with very few labeled samples. Two machine fault datasets from a cutting tooth and a bearing are conducted to evaluate the performance of DTC-SimCLR. Testing results show that DTC-SimCLR has superior performance and diagnostic accuracy with very few samples.


2021 ◽  
Author(s):  
André Marquardt ◽  
Philip Kollmannsberger ◽  
Markus Krebs ◽  
Markus Knott ◽  
Antonio Giovanni Solimando ◽  
...  

1.AbstractPersonalized Oncology is a rapidly evolving area and offers cancer patients therapy options more specific than ever. Yet, there is still a lack of understanding regarding transcriptomic similarities or differences of metastases and corresponding primary sites. Approaching this question, we used two different unsupervised dimension reduction methods – t-SNE and UMAP – on three different metastases datasets – prostate cancer, neuroendocrine prostate cancer, and skin cutaneous melanoma – including 682 different samples, with three different underlying data transformations – unprocessed FPKM values, log10 transformed FPKM values, and log10+1 transformed FPKM values – to visualize potential underlying clusters. The approaches resulted in formation of different clusters that were independent of respective resection sites. Additionally, data transformation critically affected cluster formation in most cases. Of note, our study revealed no tight link between the metastasis resection site and specific transcriptomic features. Instead, our analysis demonstrates the dependency of cluster formation on the underlying data transformation and the dimension reduction method applied. These observations propose data transformation as another key element in the interpretation of visual clustering approaches apart from well-known determinants such as initialization and parameters. Furthermore, the results show the need for further evaluation of underlying data alterations based on the biological question and subsequently used methods and applications.


Author(s):  
Nova Andriani ◽  
Jenie Sundari

The speed of data transformation is very much needed in order to increase work and information needs. this is very much needed in all fields. especially in the industrial sector. we can use this growing technology for many things, to minimize errors and also make work easier. it is a pity if we can take advantage of existing technology but calculations and work are still done manually. therefore, in research this is designed a web-based application using php and mysql. with this application design, it is expected to be able to make work more effective and efficient. and in this sophisticated digital age, it would be better if we choose to reduce the use of paper. this can also save operational costs and a step to reduce global warming. and during a pandemic like this, it is very necessary to reduce face-to-face interactions. to reduce that interaction and make it easier to store data and search for data in making reports and calculating cost estimates,make it application design of web- based cost of goods sold and production estimation application.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2780
Author(s):  
Paul Corral ◽  
Kristen Himelein ◽  
Kevin McGee ◽  
Isabel Molina

This paper evaluates the performance of different small area estimation methods using model and design-based simulation experiments. Design-based simulation experiments are carried out using the Mexican Intra Censal survey as a census of roughly 3.9 million households from which 500 samples are drawn using a two-stage selection procedure similar to that of Living Standards Measurement Study (LSMS) surveys. The estimation methods considered are that of Elbers, Lanjouw and Lanjouw (2003), the empirical best predictor of Molina and Rao (2010), the twofold nested error extension presented by Marhuenda et al. (2017), and finally an adaptation, presented by Nguyen (2012), that combines unit and area level information, and which has been proposed as an alternative when the available census data is outdated. The findings show the importance of selecting a proper model and data transformation so that model assumptions hold. A proper data transformation can lead to a considerable improvement in mean squared error (MSE). Results from design-based validation show that all small area estimation methods represent an improvement, in terms of MSE, over direct estimates. However, methods that model unit level welfare using only area level information suffer from considerable bias. Because the magnitude and direction of the bias is unknown ex ante, methods relying only on aggregated covariates should be used with caution, but may be an alternative to traditional area level models when these are not applicable.


Sign in / Sign up

Export Citation Format

Share Document