The Number of Topics Optimization: Clustering Approach

Although topic models have been used to build clusters of documents for more than ten years, there is still a problem of choosing the optimal number of topics. The authors analyzed many fundamental studies undertaken on the subject in recent years. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of the topic model. The authors analyzed the internal metrics of the topic model: coherence, contrast, and purity to determine the optimal number of topics and concluded that they are not applicable to solve this problem. The authors analyzed the approach to choosing the optimal number of topics based on the quality of the clusters. For this purpose, the authors considered the behavior of the cluster validation metrics: the Davies Bouldin index, the silhouette coefficient, and the Calinski-Harabaz index. A new method for determining the optimal number of topics proposed in this paper is based on the following principles: (1) Setting up a topic model with additive regularization (ARTM) to separate noise topics; (2) Using dense vector representation (GloVe, FastText, Word2Vec); (3) Using a cosine measure for the distance in cluster metric that works better than Euclidean distance on vectors with large dimensions. The methodology developed by the authors for obtaining the optimal number of topics was tested on the collection of scientific articles from the OnePetro library, selected by specific themes. The experiment showed that the method proposed by the authors allows assessing the optimal number of topics for the topic model built on a small collection of English documents.

Download Full-text

Convergence of daily mean coordinates of precise positioning methods

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/929/1/012014 ◽

2021 ◽

Vol 929 (1) ◽

pp. 012014

Author(s):

D V Kenigsberg ◽

Yu M Salamatina ◽

O A Prokhorov ◽

S I Kuzikov

Keyword(s):

Euclidean Distance ◽

The Other ◽

Local Reference Frame ◽

Relative Precision ◽

Distance Method ◽

Local Reference ◽

Euclidean Distances ◽

Coordinates Transformation ◽

Better Than

Abstract As part of the research of modern movements of the Earth’s crust, an analysis of 7 high-precision methods for calculating GNSS positions was carried out for the convergence of their daily mean coordinates. Based on Euclidean distances, regular and maximal discrepancies between coordinates of different methods are given. According to the coordinates in the ITRF, 5 methods are stood out with regular coordinate discrepancies <1 mm, and individual maximum discrepancies up to 30 mm. The other two methods have regular discrepancies in coordinates up to 2 cm, and the maximum differences reach 1 m. For a group of stations global coordinates transformation into a local reference frame leads to the effect of coordinate stabilization and increases their relative precision in the time series. As a result of such procedure, the level of maximum coordinate discrepancies between the methods decreased to 46%. Moreover, one of the methods of calculating coordinates has improved its convergence with the other methods by 80%. Based on the Euclidean distance method, the quality of the raw data for each station was evaluated. Thus, there is a group of 8 stations, for which the convergence of coordinates in different methods are approximately at the same level, and 2-3 times better than for the other 2 stations.

Download Full-text

PENERAPAN METODE K-MEANS DENGAN METODE ELBOW UNTUK SEGMENTASI PELANGGAN MENGGUNAKAN MODEL RFM(Recency, Frequency, & Monetary)

Repositor ◽

10.22219/repositor.v2i7.973 ◽

2020 ◽

Vol 2 (7) ◽

pp. 945

Author(s):

Adnan Burhan Hidayat Kiat ◽

Yufiz Azhar ◽

Vinna Rahmayanti

Keyword(s):

Euclidean Distance ◽

Motor Vehicle ◽

Cluster Formation ◽

Optimal Number ◽

Customer Segmentation ◽

Manhattan Distance ◽

Automotive Company ◽

Rfm Model ◽

Coefficient Method

Segmentasi pelanggan pada perusahaan merupakan tindakan yang dapat mempermudah perusahaan dalam mengambil keputusan ke depan. Pada penelitian ini data yang digunakan berasal dari perusahaan otomotif, PT Hasjrat Abadi Ambon. Data yang dipakai terdiri dari data transaksi dan pelanggan kendaraan bermotor. Penerapan model RFM dapat mengelompokkan pelanggan-pelanggan berdasarkan nilai variabel Recency, Frequency dan Monetary. Hasil dari model RFM akan memperoleh status baru pada tiap pelanggan dari skala terbaik sampai terburuk. Pelanggan yang telah memiliki status akan dikelompokkan menggunakan metode K-Means menjadi beberapa Cluster(kelompok). Dalam menentukan jumlah Cluster yang optimal maka diterapkan metode Elbow. Algoritma yang digunakan dalam pembentukan Cluster terdiri dari Euclidean Distance dan Manhattan Distance. Kedua algoritma akan dibandingkan kualitas pembentukan Clusternya menggunakan metode Silhoutte Coefficient. Hasil yang diberikan pada penelitian ini berupa data yang terbagi atas 5 kelompok dengan dilakukannya lima kali pengujian untuk menentukan centroid yang unggul. Cluster yang unggul akan dibuatkan visualisasi datanya untuk memudahkan perusahaan dalam mengambil keputusan. Berdasarkan penerapan Silhoutte Coefficient, algoritma yang lebih unggul yaitu Manhattan Distance dengan nilai s(i) sebesar 0.152695. Customer segmentation at the company is an action that can facilitate the company in making decisions going forward. In this study the data used came from an automotive company, PT Hasjrat Abadi Ambon. The data used consists of transaction data and motor vehicle customers. The application of the RFM model can classify customers based on the value of the Recency, Frequency and Monetary variables. The results of the RFM model will obtain a new status on each customer from the best to the worst scale. Customers who already have status will be grouped using the K-Means method into several Clusters (groups). In determining the optimal number of Clusters, the Elbow method is applied. The algorithm used in Cluster formation consists of Euclidean Distance and Manhattan Distance. The two algorithms will be compared the quality of the Cluster formation using the Silhoutte Coefficient method. The results given in this study are in the form of data divided into 5 groups by conducting five tests to determine superior centroids. Excellent clusters will be made of data visualization to facilitate the company in making decisions. Based on the application of Silhoutte Coefficient, a superior algorithm is Manhattan Distance with value s(i) : 0.152695.

Download Full-text

Methods of signal processing in a multiradar system of the same type of two-coordinated surveillance radars

Системи обробки інформації ◽

10.30748/soi.2020.162.07 ◽

2020 ◽

pp. 65-72

Author(s):

Г.В. Худов ◽

Сальман Рашід Оваід ◽

В.М. Ліщенко ◽

В.О. Тютюнник

Keyword(s):

Signal Processing ◽

Optimal Number ◽

Radar System ◽

Coherent Signals ◽

Signal Noise ◽

Noise Threshold ◽

The Subject ◽

Mechanical Rotation ◽

Expected Signal

The subject of research in the paper is the problem of developing methods of signal processing in a multiradar system of the same type of two-coordinate surveillance radars with mechanical rotation. The aim of the paper is to improve the quality of detection of air objects by combining the same type of two-coordinate radars in a multi-radar system. It is proposed to combine the existing surveillance radar stations into a spatially spaced coherent multi-radar system. The synthesis of optimal detectors of coherent and incoherent signals is carried out. The characteristics of detection of air objects in a multi-radar system with compatible signal receiving have been evaluated. The obtained results: the addition of the second radar, regardless of the degree of signal coherence, showed the greatest efficiency in the gain in terms of signal / noise, the optimal number of radars in the multi-radar system is not more than four. The expected signal / noise threshold gain in a system of four radars can be up to eighteen decibels for a system with coherent signals and up to eleven decibels for a system with incoherent signals. The using of more than four radars is impractical.

Download Full-text

Treatise Touching the Pretended Divorce of Henry the Eighth

Camden New Series ◽

10.1017/s204217020000632x ◽

1878 ◽

Vol 21 ◽

pp. 1-302

Keyword(s):

Good Fortune ◽

Small Collection ◽

The Subject

As nature has given you a right to the inheritance of my estate, a competency sufficient to support the quality of a gentleman, so I design, as a mark of true paternal affection, to add to it my small collection of books; that, by adorning your mind with useful study, you may deserve that character more from yourself than ancestors. Amongst the rest I leave you this Manuscript. A particular good fortune threw it into my hands, which, had it nothing but the subject to recommend it, would be no inconsiderable value to a Catholic, because it lets him see 'twas Interest and not Religion began the schism, and that 'tis truly Conscience and not Obstinacy makes him, by still adhering to the ancient Church, stand obnoxious to so many laws. But, that your esteem may equal its value, I have thought proper to acquaint you first what I have found concerning the Author, next what reason there is to believe this Copy authentick, and lastly to whose kindness I am obliged for it, and upon what conditions I had leave to transcribe it.

Download Full-text

Qualitative and quantitative evaluation of text printed with flexography on woven labels

Textile Research Journal ◽

10.1177/0040517520981740 ◽

2020 ◽

pp. 004051752098174

Author(s):

Urška Stanković Elesini ◽

Sara Pančur ◽

Klementina Možina

Keyword(s):

Quantitative Evaluation ◽

Positive Influence ◽

Cotton Fibers ◽

Mixed Composition ◽

Cover Factor ◽

Qualitative And Quantitative ◽

Type Size ◽

The Subject ◽

Better Than

Even though textile labels are not often the subject of research, their quality must not be neglected. Printed typographic elements (i.e. letters and texts) must be visible regardless of textile ribbons and typeface or type size to be printed. Thus, the aim of the research was to qualitatively and quantitatively analyze and evaluate the text printed with flexography in two different typefaces (Helvetica and Verdana) in three different type sizes (4, 6 and 8 point) on five textile ribbons made of polyester and polyester/cotton mixture in two different weaves (plain and satin). The results of our research showed that the quality of printed letters is influenced by the properties of textile ribbons as well as by the chosen typographic features. When textile ribbons were composed of polyester filaments, the quality of prints was better than in the case of the mixed composition with cotton fibers. The coating and previously dyed textile ribbons had a positive influence on the quality of printed letters. The typeface Verdana gave more distinct and contrasted printed letters than Helvetica. The quality of printed letters (measured by the cover factor) decreased with the reduced type size; letters (and text) in a smaller type size (4 point) were hence, depending on the properties of textile ribbons, less visible.

Download Full-text

MENINGKATKAN KUALITAS PROSES PEMBELAJARAN DAN KEMAMPUAN MENULIS CERITA DENGAN MODEL ASSURE

Premiere Educandum Jurnal Pendidikan Dasar dan Pembelajaran ◽

10.25273/pe.v6i01.300 ◽

2016 ◽

Vol 6 (01) ◽

Cited By ~ 1

Author(s):

Winda Ayu Cahya Fitriani

Keyword(s):

Elementary School ◽

Learning Process ◽

Descriptive Analysis ◽

Fifth Grade ◽

Analysis Techniques ◽

Story Writing ◽

Data Source ◽

The Subject ◽

Better Than

This research aims to improve learning quality and ability for writing story in Muhammadiyah 11 Surakarta Elementary School by applying ASSURE model. The method uses classrom research, which was carried on for five months. The subject of this research fifth grade elementary school in Muhamadiyah 11 Surakarta, which consists of 31 students. The teacher and the student is the source of the data. The techniques used for collecting data are observation, interview, angket, and test. Data source and triangulation uses to ensure the validity. Analyzing data comparative descriptive analysis techniques is used. The result of the research proves learning process cycles I in story writing achieve 60% better than pra research, which achieve less 35%. In cycles II learning process increase get 80%. This research conclute that implementing the learning model of ASSURE is capable of increasing the quality of learning process and students ability for writing story in Muhamadiyah 11 Surakarta fifth grade elementary school.

Download Full-text

Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202707 ◽

2021 ◽

pp. 1-15

Author(s):

R.M. Noorullah ◽

Moulana Mohammed

Keyword(s):

Topic Model ◽

Parametric Method ◽

Topic Models ◽

Performance Measure ◽

Optimal Number ◽

Visual Access ◽

Cluster Validity Indices ◽

Validity Indices ◽

And Performance

Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.

Download Full-text

Effect of Number of Home Exercises on Compliance and Performance in Adults Over 65 Years of Age

Physical Therapy ◽

10.1093/ptj/79.3.270 ◽

1999 ◽

Vol 79 (3) ◽

pp. 270-277 ◽

Cited By ~ 45

Author(s):

Kristin D Henry ◽

Cherie Rosemond ◽

Lynn B Eckert

Keyword(s):

Assessment Tool ◽

Optimal Number ◽

The Self ◽

Self Report ◽

Initial Session ◽

Correct Alignment ◽

Quality Of Movement ◽

And Performance ◽

Better Than

Abstract Background and Purpose. There is limited research on the effects of the number of exercises a person is told to perform on compliance and performance, as defined by cueing requirements, correct alignment, and quality of movement. Some studies of medication suggest that compliance decreases as the number of medications increases. The purpose of this study was to determine whether older adults comply and perform better (ie, requiring less cueing, exhibiting correct alignment, and exhibiting controlled, coordinated, and continuous movements) when they are asked to do 2, 5, or 8 exercises. Subjects. Subjects were 11 women and 4 men, aged 67 to 82 years (X̄=72.8), who were living independently in their communities. Methods. Subjects were randomly prescribed 2, 5, or 8 general strengthening home exercises. They were instructed on their exercises at an initial session and asked to record the number of repetitions performed each day in a self-report exercise log. At a return session 7 to 10 days later, subjects were scored on their performance of the prescribed exercises using a newly designed assessment tool. Results. The group that was prescribed 2 exercises performed better, as defined by their performance tool score, than the group that was prescribed 8 exercises. The group that was prescribed 5 exercises was not different from the groups that performed 2 or 8 exercises. No differences were found among groups regarding the self-report measurement of compliance. There was a moderate correlation between performance scores and the self-report percentage rates. Conclusion and Discussion. Subjects who were prescribed 2 exercises performed better than subjects who were prescribed 8 exercises. The question of an optimal number of exercises to prescribe to elderly people warrants further study.

Download Full-text

A Mathematics Work Room for the Senior High School

Mathematics Teacher ◽

10.5951/mt.38.3.0126 ◽

1945 ◽

Vol 38 (3) ◽

pp. 126-129

Author(s):

Jessie Roselle Smith

Keyword(s):

High School ◽

Great Increase ◽

Senior High School ◽

Abstract Theory ◽

Mathematics Courses ◽

The Subject ◽

Better Than

In the field of mathematics, the advent of the war has not yet created many major changes. The most noticeable one is that more pupils are now enrolled in the mathematics courses than formerly. That the pupils are learning the subject better than in pre-war days is also quite apparent. I am not quite so certain that we are teaching it better. The war is the cause of the great increase in the enrollments, and the pupil's motive for studying mathematics is the most powerful in existence, that of retaining his life, and the life of freedom. If we do not recognize the strength of this force acting in our favor there is danger of our becoming apathetic. It behooves us to add vitality and meaning to the abstract theory that we teach. Unless we improve the quality of our work, and arouse the interest of the pupils in this field, we are going to experience a terrific jolt when the motive furnished by the war is gone.

Download Full-text

Handwriting Analysis and Personality Assessment

European Psychologist ◽

10.1027//1016-9040.5.1.44 ◽

2000 ◽

Vol 5 (1) ◽

pp. 44-51 ◽

Cited By ~ 5

Author(s):

Peter Greasley

Keyword(s):

Personality Traits ◽

Personality Assessment ◽

Personnel Selection ◽

Empirical Studies ◽

Handwriting Analysis ◽

Recruitment Process ◽

Academic Literature ◽

Personnel Recruitment ◽

The Subject ◽

Better Than

It has been estimated that graphology is used by over 80% of European companies as part of their personnel recruitment process. And yet, after over three decades of research into the validity of graphology as a means of assessing personality, we are left with a legacy of equivocal results. For every experiment that has provided evidence to show that graphologists are able to identify personality traits from features of handwriting, there are just as many to show that, under rigorously controlled conditions, graphologists perform no better than chance expectations. In light of this confusion, this paper takes a different approach to the subject by focusing on the rationale and modus operandi of graphology. When we take a closer look at the academic literature, we note that there is no discussion of the actual rules by which graphologists make their assessments of personality from handwriting samples. Examination of these rules reveals a practice founded upon analogy, symbolism, and metaphor in the absence of empirical studies that have established the associations between particular features of handwriting and personality traits proposed by graphologists. These rules guide both popular graphology and that practiced by professional graphologists in personnel selection.

Download Full-text