Statistical Analysis of Text Corpus to Determine Appropriate Syllable Length for TTS

2021 ◽  
pp. 867-876
Author(s):  
K. V. N. Sunitha ◽  
P. Sunitha Devi

The article analyzes the dynamics of the COVID-19 pandemic, measures to counter its spread in the world and Ukraine, and also considers the economic consequences. The concept of trust and its impact on the economy are considered in detail, and indicators of trust in state and local authorities in a pandemic are analyzed. The points of view of users of social networks on the economic consequences of the pandemic are determined. The sample of publications was collected for the April-May 2020 period using 360 unique searches at the crossroads of coronavirus and government topics, including 6,726 posts from Ukrainian Facebook users. The words used in the resulting corpus of texts turned out to be: coronavirus, epidemic, quarantine, mask, government, state, president. The semantic analysis of the corpus, carried out using the Word2Vec toolkit, showed that the posts about coronavirus often discuss the state budget, measures to combat the epidemic and the incidence rate, in connection with quarantine - fines and violations, infographics on anti-epidemiological measures. To analyze user sentiment, dictionaries of positive and negative words were built and analyzed, comparing which, it can be noted that, on average, words with an optimistic tone are used 30% more often than with pessimistic ones. Analysis of the reaction to publications by number and type showed that the word "coronavirus" evokes very contradictory emotions, "laughter" and "anger" are practically on the same level. At the same time, the mention of the words "government" and "quarantine" most often causes "anger" and "sadness", "president" and "economy" - "laughter" and "anger" ("contempt" and "aggression" according to Plutchik's methodology). The article proposes a method for assessing attitudes towards anti-epidemiological measures based on the analysis of the content of social networks, including: 1) collection of data on a selected topic from the Facebook network, 2) initial training and statistical analysis of the text. corpus, 3) semantic analysis of the text corpus, 4) analysis of user sentiments. The assessment obtained using the proposed method is confirmed by the results of a survey in support of the government's work to counter the spread of coronavirus, according to which only about 10% of respondents speak positively about its actions, more than 60% - negatively.Thus, the method for assessing attitudes towards anti-epidemiological measures based on the analysis of the content of social networks is implemented as a set of SQL and Python scripts. This method can be used to regularly monitor public opinion regarding the assessment of work to counter the spread of coronavirus.


10.23856/3709 ◽  
2020 ◽  
Vol 37 (6) ◽  
pp. 92-98
Author(s):  
Nataliia Lototska

The paper includes a review of scientific works on the importance of corpus and quantitative methods, the problem of connectivity and the ways of collocation study. The article deals with the study of collocations of the emotion JOY in writer’s Text Corpus by the means of statistical methods in modern linguistics. From the point of view of language system described collocations are presented in various structural-semantic forms in author’s idiolect. Meanwhile statistical research represents a list of collocations organized according to absolute and relative frequency and association measures such as T-score and MI-score.


2019 ◽  
Vol 8 (2S8) ◽  
pp. 1366-1371

Topic modeling, such as LDA is considered as a useful tool for the statistical analysis of text document collections and other text-based data. Recently, topic modeling becomes an attractive researching field due to its wide applications. However, there are remained disadvantages of traditional topic modeling like as LDA due the shortcoming of bag-of-words (BOW) model as well as low-performance in handle large text corpus. Therefore, in this paper, we present a novel approach of topic model, called LDA-GOW, which is the combination of word co-occurrence, also called: graph-of-words (GOW) model and traditional LDA topic discovering model. The LDA-GOW topic model not only enable to extract more informative topics from text but also be able to leverage the topic discovering process from large-scaled text corpus. We test our proposed model in comparing with the traditional LDA topic model, within several standardized datasets, include: WebKB, Reuters-R8 and annotated scientific documents which are collected from ACM digital library to demonstrate the effectiveness of our proposed model. For overall experiments, our proposed LDA-GOW model gains approximately 70.86% in accuracy.


1966 ◽  
Vol 24 ◽  
pp. 188-189
Author(s):  
T. J. Deeming

If we make a set of measurements, such as narrow-band or multicolour photo-electric measurements, which are designed to improve a scheme of classification, and in particular if they are designed to extend the number of dimensions of classification, i.e. the number of classification parameters, then some important problems of analytical procedure arise. First, it is important not to reproduce the errors of the classification scheme which we are trying to improve. Second, when trying to extend the number of dimensions of classification we have little or nothing with which to test the validity of the new parameters.Problems similar to these have occurred in other areas of scientific research (notably psychology and education) and the branch of Statistics called Multivariate Analysis has been developed to deal with them. The techniques of this subject are largely unknown to astronomers, but, if carefully applied, they should at the very least ensure that the astronomer gets the maximum amount of information out of his data and does not waste his time looking for information which is not there. More optimistically, these techniques are potentially capable of indicating the number of classification parameters necessary and giving specific formulas for computing them, as well as pinpointing those particular measurements which are most crucial for determining the classification parameters.


Author(s):  
Gianluigi Botton ◽  
Gilles L'espérance

As interest for parallel EELS spectrum imaging grows in laboratories equipped with commercial spectrometers, different approaches were used in recent years by a few research groups in the development of the technique of spectrum imaging as reported in the literature. Either by controlling, with a personal computer both the microsope and the spectrometer or using more powerful workstations interfaced to conventional multichannel analysers with commercially available programs to control the microscope and the spectrometer, spectrum images can now be obtained. Work on the limits of the technique, in terms of the quantitative performance was reported, however, by the present author where a systematic study of artifacts detection limits, statistical errors as a function of desired spatial resolution and range of chemical elements to be studied in a map was carried out The aim of the present paper is to show an application of quantitative parallel EELS spectrum imaging where statistical analysis is performed at each pixel and interpretation is carried out using criteria established from the statistical analysis and variations in composition are analyzed with the help of information retreived from t/γ maps so that artifacts are avoided.


2001 ◽  
Vol 6 (3) ◽  
pp. 187-193 ◽  
Author(s):  
John R. Nesselroade

A focus on the study of development and other kinds of changes in the whole individual has been one of the hallmarks of research by Magnusson and his colleagues. A number of different approaches emphasize this individual focus in their respective ways. This presentation focuses on intraindividual variability stemming from Cattell's P-technique factor analytic proposals, making several refinements to make it more tractable from a research design standpoint and more appropriate from a statistical analysis perspective. The associated methods make it possible to study intraindividual variability both within and between individuals. An empirical example is used to illustrate the procedure.


Sign in / Sign up

Export Citation Format

Share Document