Corpus Statistics for Empirical Translation Studies

Author(s):  
Michael Oakes

In recent years a number of authors have made good use of statistical texts in empirical translation studies. These tests are well established in the scientific literature but have only recently been applied to the comparison of original and translated texts for the identification of the characteristics of “translationese.” There has also been interest in the comparison between professional and student translations and between machine and human translation. In this chapter, various statistical tests are examined in the context of real-world empirical studies in translation: analysis of variance and Tukey’s “honestly significant difference” test, the chi-squared test and the G-statistic, and the visualization techniques of hierarchical cluster analysis and principal component analysis. The chapter finishes with a discussion of the linguistic features chosen or found to characterize the original and translated texts.

2020 ◽  
Vol 48 (2) ◽  
pp. 588-603
Author(s):  
Kourosh ZANDIFAR ◽  
Hassanali NAGHID BADI ◽  
Ali MEHRAFARIN ◽  
Majid G. NOHOOJI

Ziziphus nummularia is a multipurpose and tropical tree with medicinal, nutritional, industrial, and economic values. This tree, which belongs to the Rhamnaceae family, is originated from the South of Asia and North of Africa. This research was carried out to investigate the phytochemical and morphological diversity of 20 wild populations collected from different Southern regions of Iran. Statistical significant difference ranges between population were found in respect to saponin of the leaf (2.2-5.4 mg/g) and fruit (1.2-3.2 mg/g), phenol of the leaf (0.7-2.9 mg/g) and fruit (0.03-0.4 mg/g), tannin of the leaf (0.8-3.5 mg/g) and fruit (1.5-1.7 mg/g), and flavonoid of the leaf (3.3-4.3 mg/g) and fruit (1.5-2.4 mg/g). A factor analysis based on principal component analysis (PCA) revealed that the first three components (PC1-PC3) explain 79.04% of total variations. The first component (PC1) is explained by the most important traits of the PCA coefficient such as the leaf saponin, width of the end leaf, fruit saponin, length of the end leaf, leaf length and width, and leaf phenol with 42% of the total variation. Hierarchical cluster analysis divided the populations into four main groups with high diversity. In general, the Izeh Tarakab population had the highest content of leaf and fruit saponin. The content of leaf and fruit saponin as the major secondary metabolite could be a good determinant for detecting diversity in the wild population of Z. nummularia.


2019 ◽  
Author(s):  
Jaroslav Flegr ◽  
Petr Tureček

AbstractBackgroundNo serological assay has 100% sensitivity. Statistically, the concentration of specific antibodies against antigens of parasites decreases with the duration of infection. This can result in false negative outputs of diagnostic tests for the subjects with old infectiong, e.g., for individuals infected in childhood. When a property of seronegative and seropositive subjects is compared under these circumstances, the statistical tests can detect no significant difference between these two groups of subjects, despite the fact that infected and noninfected subjects differ. When the effect of the infection has a cumulative character and subjects with an older infection (potential false negatives) are affected to a greater degree, we can even get paradoxical result of the comparison – the seronegative subjects have on average lower value of certain traits, e.g. IQ, despite the infection having a negative effect on the trait. A permutation test for the contaminated data, implemented, e.g., in the program Treept or available as a comprehensibly commented R function in the supplement of this paper, can be used to reveal and to eliminate the effect of false negatives.MethodsWe used a Monte Carlo simulation in the program R to show that the permutation test implemented in the programs Treept and PTPT is a conservative test.ResultsWe showed that the test could provide false negative but not false positive results if the studied population contains no subpopulation of false negative subjects. We also introduced R version of the test expanded by skewness analysis, which helps to estimate the proportion of false negative subjects based on the assumption of equal data skewness in groups of healthy and infected individuals.ConclusionsBased on the results of simulations and our experience with empirical studies we recommend the usage of permutation test for contaminated data whenever seronegative and seropositive individuals are compared.


Biomolecules ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 328 ◽  
Author(s):  
Jamile S. da Costa ◽  
Adenilson S. Barroso ◽  
Rosa Helena V. Mourão ◽  
Joyce Kelly R. da Silva ◽  
José Guilherme S. Maia ◽  
...  

The essential oil of Eugenia uniflora has been attributed anti-depressive, antinociceptive, antileishmanial, larvicidal, antioxidant, antibacterial, and antifungal activities. It is known that the cultivation of this plant can be affected by seasonality, promoting alteration in the oil composition and its biological activities. This study aims to perform the annual evaluation of the curzerene-type oil of E. uniflora and determine its antioxidant activity. The oil yield from the dry season (1.4 ± 0.6%) did not differ statistically from that of the rainy season (1.8 ± 0.8%). Curzerene, an oxygenated sesquiterpene, was the principal constituent, and its percentage showed no significant difference between the two periods: dry (42.7% ± 6.1) and rainy (40.8 ± 5.9%). Principal component and hierarchical cluster analyses presented a high level of similarity between the monthly samples of the oils. Also, in the annual study, the yield and composition of the oils did not present a significant correlation with the climatic variables. The antioxidant activity of the oils showed inhibition of DPPH radicals with an average value of 55.0 ± 6.6%. The high curzerene content in the monthly oils of E. uniflora suggests their potential for use as a future phytotherapeutic alternative.


1993 ◽  
Vol 38 (1) ◽  
pp. 9-13 ◽  
Author(s):  
David L. Streiner

The more commonly known statistical procedures, such as the t-test, analysis of variance, or chi-squared test, can handle only one dependent variable (DV) at a time. Two types of problems can arise when there is more than one DV: I. a greater probability of erroneously concluding that there is a significant difference between the groups when in fact there is none (a Type I error); and 2. failure to detect differences between the groups in terms of the patterns of DVs (a Type II error). Multivariate statistics are designed to overcome both of these problems. However, there are costs associated with these benefits, such as increased complexity, decreased power, multiple ways of answering the same question, and ambiguity in the allocation of shared variance. This is the first of a series of articles on multivariate statistical tests which will address these issues and explain their possible uses.


2011 ◽  
Vol 11 (2) ◽  
pp. 561-592 ◽  
Author(s):  
Benedikt Szmrecsanyi ◽  
Christoph Wolk

This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain.


2018 ◽  
Vol 5 (3) ◽  
pp. 131-134 ◽  
Author(s):  
Nikunj Patel ◽  
Niranjan Kanaki ◽  
Vinit Movaliya

Vast intra-specific variations, especially diurnal, geographical and seasonal, have been reported in the chemical composition of essential oils of Ocimum species. The study was conducted to assess diurnal variation in the chemical composition of the leaves of Ocimum sanctum. The leaf samples collected at different times of the day were analyzed by gas chromatography coupled with flame ionization detector (GC-FID). The chromatographic fingerprints of different leaf samples were analyzed by chemometric methods like principal component analysis and hierarchical cluster analysis. No significant difference was found in the chemical compositions of the leaf samples collected at different times of the day. The results lead to a conclusion that O. sanctum does not exhibit diurnal variation in its chemical composition, unlike O. gratissimum.


VASA ◽  
2017 ◽  
Vol 46 (6) ◽  
pp. 484-489 ◽  
Author(s):  
Tom Barker ◽  
Felicity Evison ◽  
Ruth Benson ◽  
Alok Tiwari

Abstract. Background: The invasive management of varicose veins has a known risk of post-operative deep venous thrombosis and subsequent pulmonary embolism. The aim of this study was to evaluate absolute and relative risk of venous thromboembolism (VTE) following commonly used varicose vein procedures. Patients and methods: A retrospective analysis of secondary data using Hospital Episode Statistics database was performed for all varicose vein procedures performed between 2003 and 2013 and all readmissions for VTE in the same patients within 30 days, 90 days, and one year. Comparison of the incidence of VTEs between procedures was performed using a Pearson’s Chi-squared test. Results: In total, 261,169 varicose vein procedures were performed during the period studied. There were 686 VTEs recorded at 30 days (0.26 % incidence), 884 at 90 days (0.34 % incidence), and 1,246 at one year (0.48 % incidence). The VTE incidence for different procedures was between 0.15–0.35 % at 30 days, 0.26–0.50 % at 90 days, and 0.46–0.58 % at one year. At 30 days there was a significantly lower incidence of VTEs for foam sclerotherapy compared to other procedures (p = 0.01). There was no difference in VTE incidence between procedures at 90 days (p = 0.13) or one year (p = 0.16). Conclusions: Patients undergoing varicose vein procedures have a small but appreciable increased risk of VTE compared to the general population, with the effect persisting at one year. Foam sclerotherapy had a lower incidence of VTE compared to other procedures at 30 days, but this effect did not persist at 90 days or at one year. There was no other significant difference in the incidence of VTE between open, endovenous, and foam sclerotherapy treatments.


Author(s):  
Nikunj D. Patel ◽  
Niranjan S. Kanaki

Background: Numerous Ayurvedic formulations contains tugaksheeree as key ingredient. Tugaksheereeis the starch gained from the rhizomes of two plants, Curcuma angustifoliaRoxb. (Zingiberaceae) and Marantaarundinacea (MA) Linn. (Marantaceae). Objective: The primary concerns in quality assessment of Tugaksheeree occur due to adulteration or substitution. Method: In current study, Fourier transform infrared (FTIR) technique with attenuated total reflectance (ATR) facility was used to evaluate tugaksheeree samples. Total 10 different samples were studied and transmittance mode was kept to record the spectra devoid of pellets of KBR. Further treatment was given with multi component tools by considering fingerprint region of the spectra. Multivariate analysis was performed by various chemometric methods. Result: Multi component methods like Principal Component Analysis (PCA), and Hierarchical Cluster Analysis (HCA)were used to discriminate the tugaksheeree samples using Minitab software. Conclusion: This method can be used as a tool to differentiate samples of tugaksheeree from its adulterants and substitutes.


2016 ◽  
Vol 5 (11) ◽  
pp. 5041
Author(s):  
Farkhondeh Jamshidi ◽  
Ahmad Ghorbani ◽  
Sina Darvishi*

The abuse of some pesticides especially to suicide is one of the current problems of pesticides. Aluminum phosphide induced poisoning usually happens to suicide and sometimes it is due to accidental occupational exposure and in a few cases it has some criminal intensions. This study is conducted to evaluate patients poisoned with aluminum phosphide. In the present study the medical records of cases of poisoning with rice tablets (aluminum phosphide) hospitalized in Ahvaz Razi hospital is studied. Accordingly, a checklist is prepared that included demographic information of patients (age, gender) and information on patient records (information on poisoning) are completed using the patients’ medical records. The analysis of data is done by SPSS V22. 18 patients poisoned with rice tablet (aluminum phosphide) are studied. Results of the study show that 11 patients are male and seven are female. The mean patient age is 27.06 ±8.04 years that is 28 ±9 and 25 ±6.02 in men and women respectively. Statistical tests show no statistically significant difference in mean age in both genders (P> 0.05). Among patients, 11 subjects took aluminum phosphide to attempt suicide and 3 cases took it unintentionally and of course the reason is not mentioned in four cases. Among the patients who tried to commit suicide by taking aluminum phosphide, 6 cases are male and 5 cases are female that no statistically significant difference is observed between the genders in this respect (P> 0.05). In addition to the study of the complications caused by this poisoning and its mortality, it is recommended to responsible authorities to provide the necessary educations and treatments to prevent this type of poisoning.


Author(s):  
Natuya Zhuori ◽  
Yu Cai ◽  
Yan Yan ◽  
Yu Cui ◽  
Minjuan Zhao

As the trend of aging in rural China has intensified, research on the factors affecting the health of the elderly in rural areas has become a hot issue. However, the conclusions of existing studies are inconsistent and even contradictory, making it difficult to form constructive policies with practical value. To explore the reasons for the inconsistent conclusions drawn by relevant research, in this paper we constructed a meta-regression database based on 65 pieces of relevant literature published in the past 25 years. For more valid samples to reduce publication bias, we also set the statistical significance of social support to the health of the elderly in rural areas as a dependent variable. Finally, combined with multi-dimensional social support and its implications for the health of the elderly, meta-regression analysis was carried out on the results of 171 empirical studies. The results show that (1) subjective support rather than objective support can have a significant impact on the health of the elderly in rural areas, and there is no significant difference between other dimensions of social support and objective support; (2) the health status of the elderly in rural areas in samples involving western regions is more sensitive to social support than that in samples not involving the western regions; (3) among the elderly in rural areas, social support for the older male elderly is more likely to improve their health than that for the younger female elderly; and (4) besides this, both data sources and econometric models greatly affect the heterogeneity of the effect of social support on the health of the elderly in rural areas, but neither the published year nor the journal is significant. Finally, relevant policies and follow-up studies on the impact of social support on the health of the elderly in rural areas are discussed.


Sign in / Sign up

Export Citation Format

Share Document