scholarly journals Urdu Handwritten Characters Data Visualization and Recognition Using Distributed Stochastic Neighborhood Embedding and Deep Network

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Mujtaba Husnain ◽  
Malik Muhammad Saad Missen ◽  
Shahzad Mumtaz ◽  
Dost Muhammad Khan ◽  
Mickäel Coustaty ◽  
...  

In this paper, we make use of the 2-dimensional data obtained through t-Stochastic Neighborhood Embedding (t-SNE) when applied on high-dimensional data of Urdu handwritten characters and numerals. The instances of the dataset used for experimental work are classified in multiple classes depending on the shape similarity. We performed three tasks in a disciplined order; namely, (i) we generated a state-of-the-art dataset of both the Urdu handwritten characters and numerals by inviting a number of native Urdu participants from different social and academic groups, since there is no publicly available dataset of such type till date, then (ii) applied classical approaches of dimensionality reduction and data visualization like Principal Component Analysis (PCA), Autoencoders (AE) in comparison with t-Stochastic Neighborhood Embedding (t-SNE), and (iii) used the reduced dimensions obtained through PCA, AE, and t-SNE for recognition of Urdu handwritten characters and numerals using a deep network like Convolution Neural Network (CNN). The accuracy achieved in recognition of Urdu characters and numerals among the approaches for the same task is found to be much better. The novelty lies in the fact that the resulting reduced dimensions are used for the first time for the recognition of Urdu handwritten text at the character level instead of using the whole multidimensional data. This results in consuming less computation time with the same accuracy when compared with processing time consumed by recognition approaches applied to other datasets for the same task using the whole data.

2021 ◽  
Vol 12 ◽  
Author(s):  
Jianping Zhao ◽  
Na Wang ◽  
Haiyun Wang ◽  
Chunhou Zheng ◽  
Yansen Su

Dimensionality reduction of high-dimensional data is crucial for single-cell RNA sequencing (scRNA-seq) visualization and clustering. One prominent challenge in scRNA-seq studies comes from the dropout events, which lead to zero-inflated data. To address this issue, in this paper, we propose a scRNA-seq data dimensionality reduction algorithm based on a hierarchical autoencoder, termed SCDRHA. The proposed SCDRHA consists of two core modules, where the first module is a deep count autoencoder (DCA) that is used to denoise data, and the second module is a graph autoencoder that projects the data into a low-dimensional space. Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in five real scRNA-seq datasets. Besides, SCDRHA can also dramatically improve the performance of data visualization and cell clustering.


2020 ◽  
Author(s):  
Oxana Ye. Rodionova ◽  
Sergey Kucheryavskiy ◽  
Alexey L. Pomerantsev

<div><div><div><p>Basic tools for exploration and interpretation of Principal Component Analysis (PCA) results are well- known and thoroughly described in many comprehensive tutorials. However, in the recent decade, several new tools have been developed. Some of them were originally created for solving authentication and classification tasks. In this paper we demonstrate that they can also be useful for the exploratory data analysis.</p><p><br></p><p>We discuss several important aspects of the PCA exploration of high dimensional datasets, such as estimation of a proper complexity of PCA model, dependence on the data structure, presence of outliers, etc. We introduce new tools for the assessment of the PCA model complexity such as the plots of the degrees of freedom developed for the orthogonal and score distances, as well as the Extreme and Distance plots, which present a new look at the features of the training and test (new) data. These tools are simple and fast in computation. In some cases, they are more efficient than the conventional PCA tools. A simulated example provides an intuitive illustration of their application. Three real-world examples originated from various fields are employed to demonstrate capabilities of the new tools and ways they can be used. The first example considers the reproducibility of a handheld spectrometer using a dataset that is presented for the first time. The other two datasets, which describe the authentication of olives in brine and classification of wines by their geographical origin, are already known and are often used for the illustrative purposes.</p><p><br></p><p>The paper does not touch upon the well-known things, such as the algorithms for the PCA decomposition, or interpretation of scores and loadings. Instead, we pay attention primarily to more advanced topics, such as exploration of data homogeneity, understanding and evaluation of an optimal model complexity. The examples are accompanied by links to free software that implements the tools.</p></div></div></div>


2020 ◽  
Author(s):  
Oxana Ye. Rodionova ◽  
Sergey Kucheryavskiy ◽  
Alexey L. Pomerantsev

<div><div><div><p>Basic tools for exploration and interpretation of Principal Component Analysis (PCA) results are well- known and thoroughly described in many comprehensive tutorials. However, in the recent decade, several new tools have been developed. Some of them were originally created for solving authentication and classification tasks. In this paper we demonstrate that they can also be useful for the exploratory data analysis.</p><p><br></p><p>We discuss several important aspects of the PCA exploration of high dimensional datasets, such as estimation of a proper complexity of PCA model, dependence on the data structure, presence of outliers, etc. We introduce new tools for the assessment of the PCA model complexity such as the plots of the degrees of freedom developed for the orthogonal and score distances, as well as the Extreme and Distance plots, which present a new look at the features of the training and test (new) data. These tools are simple and fast in computation. In some cases, they are more efficient than the conventional PCA tools. A simulated example provides an intuitive illustration of their application. Three real-world examples originated from various fields are employed to demonstrate capabilities of the new tools and ways they can be used. The first example considers the reproducibility of a handheld spectrometer using a dataset that is presented for the first time. The other two datasets, which describe the authentication of olives in brine and classification of wines by their geographical origin, are already known and are often used for the illustrative purposes.</p><p><br></p><p>The paper does not touch upon the well-known things, such as the algorithms for the PCA decomposition, or interpretation of scores and loadings. Instead, we pay attention primarily to more advanced topics, such as exploration of data homogeneity, understanding and evaluation of an optimal model complexity. The examples are accompanied by links to free software that implements the tools.</p></div></div></div>


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Anna C. Belkina ◽  
Christopher O. Ciccolella ◽  
Rina Anno ◽  
Richard Halpert ◽  
Josef Spidlen ◽  
...  

AbstractAccurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. We develop opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Leibler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Yijie Shen ◽  
Isaac Nape ◽  
Xilin Yang ◽  
Xing Fu ◽  
Mali Gong ◽  
...  

AbstractVector beams, non-separable in spatial mode and polarisation, have emerged as enabling tools in many diverse applications, from communication to imaging. This applicability has been achieved by sophisticated laser designs controlling the spin and orbital angular momentum, but so far is restricted to only two-dimensional states. Here we demonstrate the first vectorially structured light created and fully controlled in eight dimensions, a new state-of-the-art. We externally modulate our beam to control, for the first time, the complete set of classical Greenberger–Horne–Zeilinger (GHZ) states in paraxial structured light beams, in analogy with high-dimensional multi-partite quantum entangled states, and introduce a new tomography method to verify their fidelity. Our complete theoretical framework reveals a rich parameter space for further extending the dimensionality and degrees of freedom, opening new pathways for vectorially structured light in the classical and quantum regimes.


2018 ◽  
Author(s):  
Anna C. Belkina ◽  
Christopher O. Ciccolella ◽  
Rina Anno ◽  
Richard Halpert ◽  
Josef Spidlen ◽  
...  

Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. We developed opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Liebler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation.


2017 ◽  
Vol 16 (4) ◽  
pp. 346-360 ◽  
Author(s):  
Dariusz Jamroz

The article describes a new unique method of multidimensional data visualization. It has been developed as modified observational tunnels method, which was previously known and used many times. The modification consists in supplementing the observational tunnels method used for visualization of multidimensional data with the concept of perspective. In this way, the orientation and navigation in multidimensional space are largely facilitated. The differences in effects of observational tunnels method and perspective-based observational tunnels method have been presented. The effectiveness of the new visualization method has been compared with selected four well-known methods of multidimensional data visualization: parallel coordinates, orthogonal projection, principal component analysis, and multidimensional scaling. The research revealed that the perspective-based observational tunnels method sometimes makes it possible to obtain information about significant features of analyzed data even when other methods selected for comparative studies are not able to show it. This article includes a presentation of the views of 5-dimensional data obtained from the print recognition process, which allowed the author to state that the features chosen for the development of spatial features are, in this case, sufficient for the correct recognition process. The previously published ranking presenting seven different methods of multidimensional data visualization was supplemented with the perspective-based observational tunnels method. This ranking was conducted using 7-dimensional data describing different types of coal. Thus, it was shown that, in this case, the presented method constitutes the efficient tool among other qualitative visualization analysis methods.


2021 ◽  
Vol 15 (8) ◽  
pp. 898-911
Author(s):  
Yongqing Zhang ◽  
Jianrong Yan ◽  
Siyu Chen ◽  
Meiqin Gong ◽  
Dongrui Gao ◽  
...  

Rapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.


Sign in / Sign up

Export Citation Format

Share Document