scholarly journals APPROACH FOR MINIMIZATION OF PHONEME GROUPS IN AUTHORSHIP ATTRIBUTION

2020 ◽  
pp. 55-62
Author(s):  
Iryna Khomytska ◽  
Vasyl Teslyuk ◽  
Iryna Bazylevych ◽  
Inna Shylinska

The developed mathematical support for authorship attribution software includes a combination of statistical methods (Student’s t-test, Kolmogorov-Smirnov’s test) and a statistical model for determining significant differences between styles. The combination of statistical methods allows us to enhance test validity of authorship attribution by obtaining the same results by the two methods applied. The model developed makes it possible to identify a consonant phoneme group with high style identification capability. The phoneme position in a word is taken into account. The greater number of significant differences is, the higher authorship identification capability of the phoneme group is. The developed system software is based on the algorithms of the used combination of methods and statistical model. The Java programming language provides platform independence. The minimized number of consonant phoneme groups makes the process of style and authorship attribution more automated. The obtained results of comparisons of the scientific, belles-lettres, conversational and newspaper styles are presented. The data obtained allows us to assert that the used combination of methods and the developed statistical model improve test validity of style and authorship attribution.

Author(s):  
D. A. Klyushin ◽  
V. Yu. Mykhaylyuk

The paper describes the results of comparison of two nonparametric methods of authorship identification in English literature. It describes testing methods with and without clustering. A method was also proposed to select the n-grams that would best serve as a marker to identify the author. More than 800 texts of 16 authors were used for testing. The method using the density of the distribution is suitable for identifying authors of both large texts (50000+ characters) and small (10000+ characters) ones. A method that uses p-statistics is only suitable for large texts.


2022 ◽  
Vol 51 (4) ◽  
pp. 905-914
Author(s):  
Elena Titorenko ◽  
Natalia Trofimova ◽  
Evgenia Ermolaeva ◽  
Ivan Trofimov ◽  
Leonid Breskin ◽  
...  

Introduction. Statistical methods of data processing and IT technologies make it possible to introduce new modern methods of hazard and risk analysis in food industry. The research objective was to develop new software that would link together various risk-related production data. Study objects and methods. The research featured food production company LLC Yug (Biysk, Russia) that specializes in functional products and various ready-made software automation solutions. The study also involved statistical methods, methods of observation, collection of primary information, sequential top-down development of algorithms, and the Java programming language. Results and discussion. Food producers have a registration procedure for inconsistencies and violations of permissible limits at critical control points. The authors developed a new software program that allows production line operators to enter data on downtime and other violations of the production process. The program makes it possible for managers to receive up-to-date reports on various criteria, identify violations, and select appropriate corrective actions. This ready-made solution automates the process of accounting and hazard analysis. The program was tested at LLC Yug with the focus on the time that operators and managers needed to register the problem, analyze the data, develop corrective or preventive measures, and apply them. Conclusion. The new software proved to be less time-consuming than standard procedures applied in food industry and made it possible to save the time that operators and managers spent on decision making and reporting.


2020 ◽  
Vol 11 (5) ◽  
pp. 1255-1281 ◽  
Author(s):  
Valeria Edefonti ◽  
Roberta De Vito ◽  
Andrea Salvatori ◽  
Francesca Bravi ◽  
Linia Patel ◽  
...  

ABSTRACT Few studies have considered if a posteriori dietary patterns (DPs) are generalizable across different centers or studies, or if they are consistently seen over time. To date, no systematic search of the literature on these topics has been carried out. A scoping review was conducted through a systematic search on the PubMed database. In the current review, we included the 34 articles examining the extent to which a posteriori DPs were consistently seen: 1) across centers from the same study or across different studies potentially representing different populations or countries (here indicated as cross-study reproducibility) and 2) over longer time periods (i.e., ≥2 y) (here indicated as stability over time). Selected articles (published in 1981–2019, 32% from 2010 onwards) were based on observational studies, mostly from Europe and North America. Five articles were based on children and/or adolescents and 14 articles included adults (2 men; 12 women, of whom 3 were pregnant women). A posteriori DPs were mostly derived (32 articles) with principal component or factor analyses. Among the 9 articles assessing DP reproducibility across studies (number of centers/studies: 2–27; median: 3), 5 provided a formal assessment using statistical methods (4 index-based approaches of different complexity, 1 statistical model). A median of 4 DPs was reproduced across centers/studies (range: 1–7). Among the 25 articles assessing DP stability over time (number of time-occasions: 2–6; median: 3), 19 provided a formal assessment with statistical methods (17 index-based and/or test-based approaches, 1 statistical model, 1 with both strategies). The number and composition of DPs remained mostly stable over time. Based on the limited evidence collected, most identified DPs showed good reproducibility across studies and stability over time. However, when present within the single studies, the criteria for the formal assessment of cross-study reproducibility or stability over time were generally very basic.


2010 ◽  
Vol 654-656 ◽  
pp. 1578-1581
Author(s):  
Chun Sheng Lu

In the paper, the up-to-date advances in the statistical analysis of nano-mechanical measurements are briefly reviewed. It is shown that, by means of statistical methods such as a minimum information criterion, a better statistical model can be selected for quantifying the intrinsic mechanical properties of nanomaterials or extracting the optimal information from those imperfect experimental data obtained with recently available nano-mechanical testing techniques.


Author(s):  
Ritu Banga ◽  
Akanksha Bhardwaj ◽  
Sheng-Lung Peng ◽  
Gulshan Shrivastava

This chapter gives a comprehensive knowledge of various machine learning classifiers to achieve authorship attribution (AA) on short texts, specifically tweets. The need for authorship identification is due to the increasing crime on the internet, which breach cyber ethics by raising the level of anonymity. AA of online messages has witnessed interest from many research communities. Many methods such as statistical and computational have been proposed by linguistics and researchers to identify an author from their writing style. Various ways of extracting and selecting features on the basis of dataset have been reviewed. The authors focused on n-grams features as they proved to be very effective in identifying the true author from a given list of known authors. The study has demonstrated that AA is achievable on the basis of selection criteria of features and methods in small texts and also proved the accuracy of analysis changes according to combination of features. The authors found character grams are good features for identifying the author but are not yet able to identify the author independently.


1975 ◽  
Vol 7 (6) ◽  
pp. 725-734 ◽  
Author(s):  
A D Cliff ◽  
J K Ord

A basic assumption of many statistical methods, which is rarely satisfied by geographically-located observations, is that the data to be analysed represent (spatially) independent drawings from some population or populations. In this paper, we illustrate the disastrous consequences of the failure to meet this assumption upon applications of tests for means based upon Student's t distribution. Procedures are developed which enable such tests to be applied more appropriately in the presence of autocorrelated samples.


Author(s):  
Dutt Mehta ◽  
Eric Prommer

Failed in-hospital resuscitations consume substantial health care resources. Accurate prediction of who will survive an in-hospital arrest is difficult. Identifying pre arrest factors associated with poor resuscitation outcomes facilitates and enhances cardiopulmonary resuscitation (CPR) discussions. Using a large CPR database, several preexisting factors associated with poor CPR outcome were identified and analyzed using statistical methods. The statistical model identifies factors associated with poor outcomes such as black race, advancing age, and multiple pre-existing conditions using the National Registry for Cardiopulmonary Resuscitation. The chapter describes the basics of the study, briefly reviews other relevant studies and information, gives a summary and discusses implications, and concludes with a relevant clinical case.


Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1138
Author(s):  
Iryna Khomytska ◽  
Vasyl Teslyuk ◽  
Natalia Kryvinska ◽  
Iryna Bazylevych

A one-consonant group approach to the authorship attribution has been proposed. The approach is based on determining, by the chi-square test, the consonant group in which the difference between the texts by different authors is statistically significant. The developed model determines author-differentiating capability of each consonant group in a relation of the number of comparisons, in which the difference between the texts by two authors is statistically significant to the total number of comparisons. The determined general author-differentiating capability of the group of stop consonants, which is a statistical parameter of the authorial style, is the highest in the comparisons of texts from the publicist and belles-lettres styles. The one-consonant group approach simplifies the whole process of authorship attribution and ensures a higher level of automation. The conducted experiments on the Java programming language have proved that the chi-square test is a powerful nonparametric statistical test that can be used for author identification on the level of English consonants with a test validity of 95%.


Sign in / Sign up

Export Citation Format

Share Document