scholarly journals Parametric Embedding for Class Visualization

2007 ◽  
Vol 19 (9) ◽  
pp. 2536-2556 ◽  
Author(s):  
Tomoharu Iwata ◽  
Kazumi Saito ◽  
Naonori Ueda ◽  
Sean Stromsten ◽  
Thomas L. Griffiths ◽  
...  

We propose a new method, parametric embedding (PE), that embeds objects with the class structure into a low-dimensional visualization space. PE takes as input a set of class conditional probabilities for given data points and tries to preserve the structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier's behavior in supervised, semisupervised, and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of Web pages, semisupervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, latent Dirichlet allocation.

2020 ◽  
Vol 8 (1) ◽  
Author(s):  
Martin Keller-Ressel ◽  
Stephanie Nargang

Abstract We introduce hydra (hyperbolic distance recovery and approximation), a new method for embedding network- or distance-based data into hyperbolic space. We show mathematically that hydra satisfies a certain optimality guarantee: it minimizes the ‘hyperbolic strain’ between original and embedded data points. Moreover, it is able to recover points exactly, when they are contained in a low-dimensional hyperbolic subspace of the feature space. Testing on real network data we show that the embedding quality of hydra is competitive with existing hyperbolic embedding methods, but achieved at substantially shorter computation time. An extended method, termed hydra+, typically outperforms existing methods in both computation time and embedding quality.


2020 ◽  
Vol 5 (Spring 2020) ◽  
Author(s):  
Jaymie Ruddock

Professional development in its most traditional form is a classroom setting with a lecturer and an overwhelming amount of information. It is no surprise, then, that informal professional development away from institutions and on the teacher's own terms is a growing phenomenon due to an increased presence of educators on social media. These communities of educators use hashtags to broadcast to each other, with general hashtags such as #edchat having the broadest audience. However, many math educators use the hashtags #ITeachMath and #MTBoS, communities I was interested in learning more about. I built a Python script that used Tweepy to connect to Twitter's API, using try/except blocks to catch HTTP status codes that Twitter occasionally passes through the API. When it was finally completed, a sample of such tweets was collected and then processed using Python to determine polarity, objectivity, and word frequency, first as a group and then by choice of hashtag. Additional analysis included Latent Dirichlet Allocation and hierarchical clustering, and conversations between individuals were analyzed for topic and complexity to understand the extent of interactions. This information will be used to determine the extent of professional development (PD) that teachers do on Twitter simply by actively participating in such communities and ways to improve informal PD. It was determined that there is a significant amount of professional development opportunities on Twitter, but they are muddled by a lot of other content. Further research into the types and the frequency of collaborations on top of the existing latent topics could provide insight into the applications of informal professional development.


2021 ◽  
pp. 016555152110077
Author(s):  
Sulong Zhou ◽  
Pengyu Kan ◽  
Qunying Huang ◽  
Janet Silbernagel

Natural disasters cause significant damage, casualties and economical losses. Twitter has been used to support prompt disaster response and management because people tend to communicate and spread information on public social media platforms during disaster events. To retrieve real-time situational awareness (SA) information from tweets, the most effective way to mine text is using natural language processing (NLP). Among the advanced NLP models, the supervised approach can classify tweets into different categories to gain insight and leverage useful SA information from social media data. However, high-performing supervised models require domain knowledge to specify categories and involve costly labelling tasks. This research proposes a guided latent Dirichlet allocation (LDA) workflow to investigate temporal latent topics from tweets during a recent disaster event, the 2020 Hurricane Laura. With integration of prior knowledge, a coherence model, LDA topics visualisation and validation from official reports, our guided approach reveals that most tweets contain several latent topics during the 10-day period of Hurricane Laura. This result indicates that state-of-the-art supervised models have not fully utilised tweet information because they only assign each tweet a single label. In contrast, our model can not only identify emerging topics during different disaster events but also provides multilabel references to the classification schema. In addition, our results can help to quickly identify and extract SA information to responders, stakeholders and the general public so that they can adopt timely responsive strategies and wisely allocate resource during Hurricane events.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 740
Author(s):  
Hoshin V. Gupta ◽  
Mohammad Reza Ehsani ◽  
Tirthankar Roy ◽  
Maria A. Sans-Fuentes ◽  
Uwe Ehret ◽  
...  

We develop a simple Quantile Spacing (QS) method for accurate probabilistic estimation of one-dimensional entropy from equiprobable random samples, and compare it with the popular Bin-Counting (BC) and Kernel Density (KD) methods. In contrast to BC, which uses equal-width bins with varying probability mass, the QS method uses estimates of the quantiles that divide the support of the data generating probability density function (pdf) into equal-probability-mass intervals. And, whereas BC and KD each require optimal tuning of a hyper-parameter whose value varies with sample size and shape of the pdf, QS only requires specification of the number of quantiles to be used. Results indicate, for the class of distributions tested, that the optimal number of quantiles is a fixed fraction of the sample size (empirically determined to be ~0.25–0.35), and that this value is relatively insensitive to distributional form or sample size. This provides a clear advantage over BC and KD since hyper-parameter tuning is not required. Further, unlike KD, there is no need to select an appropriate kernel-type, and so QS is applicable to pdfs of arbitrary shape, including those with discontinuous slope and/or magnitude. Bootstrapping is used to approximate the sampling variability distribution of the resulting entropy estimate, and is shown to accurately reflect the true uncertainty. For the four distributional forms studied (Gaussian, Log-Normal, Exponential and Bimodal Gaussian Mixture), expected estimation bias is less than 1% and uncertainty is low even for samples of as few as 100 data points; in contrast, for KD the small sample bias can be as large as -10% and for BC as large as -50%. We speculate that estimating quantile locations, rather than bin-probabilities, results in more efficient use of the information in the data to approximate the underlying shape of an unknown data generating pdf.


Nanomaterials ◽  
2020 ◽  
Vol 10 (4) ◽  
pp. 732 ◽  
Author(s):  
Takahiro Shimada ◽  
Koichiro Minaguro ◽  
Tao Xu ◽  
Jie Wang ◽  
Takayuki Kitamura

Beyond a ferroelectric critical thickness of several nanometers existed in conventional ferroelectric perovskite oxides, ferroelectricity in ultimately thin dimensions was recently discovered in SnTe monolayers. This discovery suggests the possibility that SnTe can sustain ferroelectricity during further low-dimensional miniaturization. Here, we investigate a ferroelectric critical size of low-dimensional SnTe nanostructures such as nanoribbons (1D) and nanoflakes (0D) using first-principle density-functional theory calculations. We demonstrate that the smallest (one-unit-cell width) SnTe nanoribbon can sustain ferroelectricity and there is no ferroelectric critical size in the SnTe nanoribbons. On the other hand, the SnTe nanoflakes form a vortex of polarization and lose their toroidal ferroelectricity below the surface area of 4 × 4 unit cells (about 25 Å on one side). We also reveal the atomic and electronic mechanism of the absence or presence of critical size in SnTe low-dimensional nanostructures. Our result provides an insight into intrinsic ferroelectric critical size for low-dimensional chalcogenide layered materials.


2021 ◽  
Author(s):  
Faizah Faizah ◽  
Bor-Shen Lin

BACKGROUND The World Health Organization (WHO) declared COVID-19 as a global pandemic on January 30, 2020. However, the pandemic has not been over yet. Furthermore, in the first quartal of 2021, some countries face the third wave of the pandemic. During the difficult time, the development of the vaccines for COVID-19 accelerates rapidly. Understanding the public perception of the COVID-19 Vaccine according to the data collected from social media can widen the perspective on the state of the global pandemic OBJECTIVE This study explores and analyzes the latent topic on COVID-19 Vaccine Tweet posted by individuals from various countries by using two-stage topic modeling. METHODS A two-stage analysis in topic modeling was proposed to investigating people’s reactions in five countries. The first stage is Latent Dirichlet Allocation that produces the latent topics with the corresponding term distributions that facilitate the investigators to understand the main issues or opinions. The second stage then performs agglomerative clustering on the latent topics based on Hellinger distance, which merges close topics hierarchically into topic clusters to visualize those topics in either tree or graph views. RESULTS In general, the topic discussion regarding the COVID-19 Vaccine in five countries is similar. Topic themes such as "first vaccine" and & "vaccine effect" dominate the public discussion. The remarkable point is that people in some countries have some topic themes, such as "politician opinion" and " stay home" in Canada, "emergency" in India, and & "blood clots" in the United Kingdom. The analysis also shows the most popular COVID-19 Vaccine, which is gaining more public interest. CONCLUSIONS With LDA and Hierarchical clustering, two-stage topic modeling is powerful for visualizing the latent topics and understanding the public perception regarding the COVID-19 Vaccine.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


2021 ◽  
Author(s):  
Samuel Duraivel ◽  
Lavanya R

Abstract This research paper explores the underlying factors that contribute toward vaccine hesitancy, resistance, and refusal. Using Latent Dirichlet Allocation (LDA), an unsupervised generative-probabilistic model, we generated latent topics from user generated Reddit corpora on reasons for Vaccine hesitancy. Although we hoped to explore the grounds for vaccine hesitancy across the globe, our findings suggest that the corpus used for analysis had been generated by users living predominantly in the United States.Observation of the topics generated by the LDA model led to the discovery of the following latent factors: (i) fear of risks and side effects, (ii) lack of trust in policymakers, (iii) related to religious belief, (iv) related to mass surveillance theories, (v) perception of vaccination as a precedence to totalitarianism, (vi) racial background pertaining to retrospective events of racial injustice, such as selective sterilization, (vii) depopulation agenda fueled by theories affiliated to Global warming and extinction rebellion, (viii) and perception of vaccination as a campaign to quell immigrant population growth, fueled by reports of coerced sterilization of immigrants in the ICE detention.


Author(s):  
Diana Mateus ◽  
Christian Wachinger ◽  
Selen Atasoy ◽  
Loren Schwarz ◽  
Nassir Navab

Computer aided diagnosis is often confronted with processing and analyzing high dimensional data. One alternative to deal with such data is dimensionality reduction. This chapter focuses on manifold learning methods to create low dimensional data representations adapted to a given application. From pairwise non-linear relations between neighboring data-points, manifold learning algorithms first approximate the low dimensional manifold where data lives with a graph; then, they find a non-linear map to embed this graph into a low dimensional space. Since the explicit pairwise relations and the neighborhood system can be designed according to the application, manifold learning methods are very flexible and allow easy incorporation of domain knowledge. The authors describe different assumptions and design elements that are crucial to building successful low dimensional data representations with manifold learning for a variety of applications. In particular, they discuss examples for visualization, clustering, classification, registration, and human-motion modeling.


Sign in / Sign up

Export Citation Format

Share Document