unsupervised feature extraction
Recently Published Documents


TOTAL DOCUMENTS

141
(FIVE YEARS 69)

H-INDEX

17
(FIVE YEARS 8)

2022 ◽  
Vol 2161 (1) ◽  
pp. 012048
Author(s):  
T N Lokesh Kumar ◽  
Bhaskarjyoti Das

Abstract Availability of enough labeled data is a challenge for most inductive learners who try to generalize based on limited labeled dataset. A traditional semi-supervised approach for the same problem attempts to approach it by methods such as wrapping multiple inductive learners on derived pseudo-labels, unsupervised feature extraction or suitable modification of the objective function. In this work, a simple approach is adopted whereby an inductive learner is enhanced by suitably enabling it with a transductive view of the data. The experiments, though conducted on a small dataset, successfully provide few insights i.e. transductive view benefits an inductive learner, a transductive view that considers both attribute and relations is more effective than one that considers either attributes or relations and graph convolution based embedding algorithms effectively captures the information from transductive views compared to popular knowledge embedding approaches.


2021 ◽  
Vol 14 (1) ◽  
pp. 8
Author(s):  
Ihar Volkau ◽  
Abdul Mujeeb ◽  
Wenting Dai ◽  
Marius Erdt ◽  
Alexei Sourin

Deep learning provides new ways for defect detection in automatic optical inspections (AOI). However, the existing deep learning methods require thousands of images of defects to be used for training the algorithms. It limits the usability of these approaches in manufacturing, due to lack of images of defects before the actual manufacturing starts. In contrast, we propose to train a defect detection unsupervised deep learning model, using a much smaller number of images without defects. We propose an unsupervised deep learning model, based on transfer learning, that extracts typical semantic patterns from defect-free samples (one-class training). The model is built upon a pre-trained VGG16 model. It is further trained on custom datasets with different sizes of possible defects (printed circuit boards and soldered joints) using only small number of normal samples. We have found that the defect detection can be performed very well on a smooth background; however, in cases where the defect manifests as a change of texture, the detection can be less accurate. The proposed study uses deep learning self-supervised approach to identify if the sample under analysis contains any deviations (with types not defined in advance) from normal design. The method would improve the robustness of the AOI process to detect defects.


Polymers ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 4117
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

The development of the medical applications for substances or materials that contact cells is important. Hence, it is necessary to elucidate how substances that surround cells affect gene expression during incubation. In the current study, we compared the gene expression profiles of cell lines that were in contact with collagen–glycosaminoglycan mesh and control cells. Principal component analysis-based unsupervised feature extraction was applied to identify genes with altered expression during incubation in the treated cell lines but not in the controls. The identified genes were enriched in various biological terms. Our method also outperformed a conventional methodology, namely, gene selection based on linear regression with time course.


2021 ◽  
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

AbstractDevelopment of the medical applications for substances or materials that contact the cells is important. Hence, it is necessary to elucidate how substance that surround cells affect the gene expression during incubation. Here, we compared the gene expression profiles of cell lines that were in contact with the collagen–glycosaminoglycan mesh and control cells. Principal component analysis-based unsupervised feature extraction was applied to identify genes with altered expression during incubation in the treated cell lines but not in the controls. The identified genes were enriched in various biological terms. Our method also outperformed a conventional methodology, namely, gene selection based on linear regression with time course.


Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1442
Author(s):  
Y-H. Taguchi ◽  
Turki Turki

Analysis of single-cell multiomics datasets is a novel topic and is considerably challenging because such datasets contain a large number of features with numerous missing values. In this study, we implemented a recently proposed tensor-decomposition (TD)-based unsupervised feature extraction (FE) technique to address this difficult problem. The technique can successfully integrate single-cell multiomics data composed of gene expression, DNA methylation, and accessibility. Although the last two have large dimensions, as many as ten million, containing only a few percentage of nonzero values, TD-based unsupervised FE can integrate three omics datasets without filling in missing values. Together with UMAP, which is used frequently when embedding single-cell measurements into two-dimensional space, TD-based unsupervised FE can produce two-dimensional embedding coincident with classification when integrating single-cell omics datasets. Genes selected based on TD-based unsupervised FE are also significantly related to reasonable biological roles.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kota Fujisawa ◽  
Mamoru Shimo ◽  
Y.-H. Taguchi ◽  
Shinya Ikematsu ◽  
Ryota Miyata

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.


2021 ◽  
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

Analysis of single-cell multiomics datasets is a novel topic and is considerably challenging because such datasets contain a large number of features with numerous missing values. In this study, we implemented a recently proposed tensor-decomposition (TD)--based unsupervised feature extraction (FE) technique to address this difficult problem. The technique can successfully integrate single-cell multiomics data composed of gene expression, DNA methylation, and accessibility. Although the last two have large dimensions, as many as ten million, containing only a few percentages of non-zero values, TD-based unsupervised FE can integrate three omics datasets without filling missing values. Together with UMAP, which is used frequently when embedding single-cell measurements into two-dimensional space, TD-based unsupervised FE can produce two-dimensional embedding coincident with classification when integrating single-cell omics datasets. Genes selected based on TD-based unsupervised FE were also significantly related to reasonable biological roles.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-15
Author(s):  
Gyoung S. Na ◽  
Hyunju Chang

Feature extraction has been widely studied to find informative latent features and reduce the dimensionality of data. In particular, due to the difficulty in obtaining labeled data, unsupervised feature extraction has received much attention in data mining. However, widely used unsupervised feature extraction methods require side information about data or rigid assumptions on the latent feature space. Furthermore, most feature extraction methods require predefined dimensionality of the latent feature space,which should be manually tuned as a hyperparameter. In this article, we propose a new unsupervised feature extraction method called Unsupervised Subspace Extractor ( USE ), which does not require any side information and rigid assumptions on data. Furthermore, USE can find a subspace generated by a nonlinear combination of the input feature and automatically determine the optimal dimensionality of the subspace for the given nonlinear combination. The feature extraction process of USE is well justified mathematically, and we also empirically demonstrate the effectiveness of USE for several benchmark datasets.


2021 ◽  
Author(s):  
Makoto Kashima ◽  
Nobuyoshi Kumagai ◽  
Hiromi Hirata ◽  
Y-h. Taguchi

RNA-Seq data analysis of non-model organisms is often difficult because of the lack of a well-annotated genome. In model organisms, after short reads are mapped to the genome, it is possible to focus on the analysis of regions well-annotated regions. However, in non-model organisms, contigs can be generated by de novo assembling. This can result in a large number of transcripts, making it difficult to easily remove redundancy. A large number of transcripts can also lead to difficulty in the recognition of differentially expressed transcripts (DETs) between more than two experimental conditions, because P-values must be corrected by considering multiple comparison corrections whose effect is enhanced as the number of transcripts increases. Heavily corrected P-values often fail to take sufficiently small P-values as significant. In this study, we applied a recently proposed tensor decomposition (TD)-based unsupervised feature extraction (FE) to the RNA-seq data obtained for a non-model organism, Planarian; we successfully obtained a limited number of transcripts whose expression was altered between normal and defective samples as well as during time development. TD-based unsupervised FE is expected to be an effective tool that can identify a limited number of DETs, even when a poorly annotated genome is available.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0251032
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

The histone group added to a gene sequence must be removed during mitosis to halt transcription during the DNA replication stage of the cell cycle. However, the detailed mechanism of this transcription regulation remains unclear. In particular, it is not realistic to reconstruct all appropriate histone modifications throughout the genome from scratch after mitosis. Thus, it is reasonable to assume that there might be a type of “bookmark” that retains the positions of histone modifications, which can be readily restored after mitosis. We developed a novel computational approach comprising tensor decomposition (TD)-based unsupervised feature extraction (FE) to identify transcription factors (TFs) that bind to genes associated with reactivated histone modifications as candidate histone bookmarks. To the best of our knowledge, this is the first application of TD-based unsupervised FE to the cell division context and phases pertaining to the cell cycle in general. The candidate TFs identified with this approach were functionally related to cell division, suggesting the suitability of this method and the potential of the identified TFs as bookmarks for histone modification during mitosis.


Sign in / Sign up

Export Citation Format

Share Document