APPLYING STATISTICAL METHODS FOR ANALYZING TEXTS OF THE PRESIDENTIAL ELECTION PROGRAMMES

Author(s):  
M. M. Osypchuk ◽  
I. M. Hural ◽  
L. R. Smolovyk

Modern statistics is equipped with the methods of formalization (mea­surement) of the objects of different nature. This concerns in particular texts of the so called natural language. This article provides analysis (conducted by means of statistical methods) of the election programmes’ texts of the can­didates for Ukraine’s Presidency in the 2019 election. With the method of multidimensional scaling, the data set was created that consists of two nu­merical characteristics that describe the peculiarities of the reviewed pro­grammes’ texts. With the correlation analysis, the correlation was estab­lished between the texts of the candidates’ election programmes and the offi­cial results of the first round of the election, as well as the results of the na­tionwide exit poll. By applying the Ward’s method cluster analysis, the four groups of the candidates for Ukraine’s Presidency were outlined. Also, the peculiarities of the groups’ programmes texts were identified, as well as the key words clouds were created for quick apprehension of the most frequently used words and their distribution according to popularity. Data preparation and all statistical calculations were performed with the help of the statistical calculation environment R.

Author(s):  
Svitlana Voloschynska ◽  
Valentyna Golub

It is fixed the background meaning of heavy metals on the conditionally clean territories and it is ascertained the level of contamination by them the urboecosystem in Kovel. It is proved essential changes the agrochemical indexes of municipal soils. It is carried out correlation and cluster analyses between agrochemical indexes and heavy metals in the background areas and the urboecosystem in Kovel. Key words: heavy metals, background meaning, urboecosystem, agrochemical indexes, correlation analysis, cluster analysis.


1998 ◽  
Author(s):  
Márcia M. Duarte dos Santos ◽  
Miguel C. Sanchez ◽  
Sueli Ap. Mingoti

In this paper some methods commonly used in multivariate cluster analysis are discussed and compared byusing a specific data set. The main objective is to show that the graphical method is efficient and similar tothe Ward and K-Means methods which are based upon mathematical and statistical theories. Thesimilarityfound between the graphical and the statistical methods suggests that although it is more subjective thegraphical method is a valid technique that could be used to determine the clusters of a sample or a population.,


2008 ◽  
Vol 31 (1) ◽  
pp. 83-84 ◽  
Author(s):  
Michael Lavine

AbstractCluster analysis, factor analysis, and multidimensional scaling are not good guides to the number of groups in a data set. In fact, the number of groups may not be a well-defined concept.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


1973 ◽  
Author(s):  
Robert R. Read ◽  
Richard S. Elster ◽  
Gerald L. Musgrave ◽  
John W. Creighton ◽  
William H. Githens

2019 ◽  
Vol 13 (1) ◽  
pp. 20-27 ◽  
Author(s):  
Srishty Jindal ◽  
Kamlesh Sharma

Background: With the tremendous increase in the use of social networking sites for sharing the emotions, views, preferences etc. a huge volume of data and text is available on the internet, there comes the need for understanding the text and analysing the data to determine the exact intent behind the same for a greater good. This process of understanding the text and data involves loads of analytical methods, several phases and multiple techniques. Efficient use of these techniques is important for an effective and relevant understanding of the text/data. This analysis can in turn be very helpful in ecommerce for targeting audience, social media monitoring for anticipating the foul elements from society and take proactive actions to avoid unethical and illegal activities, business analytics, market positioning etc. Method: The goal is to understand the basic steps involved in analysing the text data which can be helpful in determining sentiments behind them. This review provides detailed description of steps involved in sentiment analysis with the recent research done. Patents related to sentiment analysis and classification are reviewed to throw some light in the work done related to the field. Results: Sentiment analysis determines the polarity behind the text data/review. This analysis helps in increasing the business revenue, e-health, or determining the behaviour of a person. Conclusion: This study helps in understanding the basic steps involved in natural language understanding. At each step there are multiple techniques that can be applied on data. Different classifiers provide variable accuracy depending upon the data set and classification technique used.


2007 ◽  
Vol 56 (6) ◽  
pp. 75-83 ◽  
Author(s):  
X. Flores ◽  
J. Comas ◽  
I.R. Roda ◽  
L. Jiménez ◽  
K.V. Gernaey

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.


Genetics ◽  
2001 ◽  
Vol 159 (2) ◽  
pp. 699-713
Author(s):  
Noah A Rosenberg ◽  
Terry Burke ◽  
Kari Elo ◽  
Marcus W Feldman ◽  
Paul J Freidlin ◽  
...  

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.


Sign in / Sign up

Export Citation Format

Share Document