scholarly journals East Slavic indefinite pronouns: a corpus-based approach

Author(s):  
Yana Penkova ◽  
Achim Rabus

AbstractThe paper focuses on the development and functional distribution of indefinite pronouns in Old East Slavic, taking into account different sources, genres and registers. All the examples in the collected dataset were taken from the historical modules of the Russian National Corpus. They were tagged for type of indefinite marker, source (including originality and date), type of reference of the indefinite marker, semantics, type of discourse, and the degree of formality (formal or informal) present in the context. We then applied both descriptive and inferential statistical methods such as Random Forest analysis as well as multinomial logistic regression. Our analysis enabled us to identify the primary and secondary predictors of the choice of a particular indefinite marker and to trace the functional distribution of indefinite markers according to these factors.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuai Wang ◽  
James B. Meigs ◽  
Josée Dupuis

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both types of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have a well-controlled type-I error rate, but the multinomial logistic regression has an inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power. However, because the Wald statistics rely on computer-intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.


Author(s):  
Wesly Jeune ◽  
Márcio Rocha Francelino ◽  
Eliana de Souza ◽  
Elpídio Inácio Fernandes Filho ◽  
Genelício Crusoé Rocha

2021 ◽  
Author(s):  
Shuai Wang ◽  
James Meigs ◽  
Josee Dupuis

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both type of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of widely used efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have well-controlled type-I error rate, but the multinomial logistic regression has inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power rate. However, because the Wald statistics rely on computer intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.


2021 ◽  
Vol 5 (1) ◽  
pp. 35
Author(s):  
Uttam Narendra Thakur ◽  
Radha Bhardwaj ◽  
Arnab Hazra

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.


2020 ◽  
Vol 198 ◽  
pp. 02024
Author(s):  
Rui Liu ◽  
Luyao Li ◽  
Zili Lai ◽  
Xin Yang

This paper adopts three models including the logistic regression (LR), support vector machine (SVM), and random forest (RF) to study the susceptibility distribution rule of susceptibility distribution of earthquakes induced landslides. The Area Under the Receiver Operating Characteristic (ROC) curve (AUC) and Ratio were used for evaluating the model’s accuracy and mapping availability susceptibility assessment. The result shows that RF has the best performance in the susceptibility assessment of earthquake-induced landslides in the Lushan region of China.


2021 ◽  
Author(s):  
A. Mairpady ◽  
Abdel-Hamid I. Mourad ◽  
A S Mohammad Sayem Mozumder

Abstract Cartilage repair is one of the most challenging tasks for the orthopedic surgeons and researchers. The primary challenge lies on the fact that the development of the extracellular matrixes requires specialized cells known as chondrocytes which are sparse in numbers. Chondrocytes’ minimal self-renewal capacity makes it further troublesome and expensive to repair the cartilages. In designing successful substitutes for the cartilages, the selection of materials used for the scaffold fabrication plays the central role among several other important factors in order to ensure the success of the survival and proliferation of any biomaterial substitutes. Since last few decades, polymer and polymers' combination have been extensively used to fabricate such scaffolds and have shown promising results in terms of mechanical integrity and biocompatibility. In an empirical approach, the selection of the most appropriate polymer(s) for cartilage repair is an expensive and time-consuming affair, as traditionally, it requires numerous trials. Moreover, it is humanly impossible to go through the huge library of literature available on the potential polymer(s) and to correlate their physical, mechanical and biological properties that might be suitable for cartilage tissue engineering. With the advancement of machine learning, material design may experience a significant reduction in experimental time and cost. The objective of this study is to implement an inverse design approach to select the best polymer(s) or composites for cartilage repair by using the machine learning algorithms, such as random forest regression (i.e., regression trees) and the multinomial logistic regression. In these algorithms, the mechanical properties of the polymers, which are similar to the cartilages, are considered as the input and the polymer(s)/composites are the predicted output. According to the random forest regression and multinomial logistic regression, the polymer(s)/composites (i.e., the output) having the closest characteristics of the articular cartilages were found to be a composite of polycaprolactone and poly(bisphenol A carbonate) and a blend of polyethylene/polyethylene-graft-poly(maleic anhydride), respectively. These composites exhibit similar biomechanical properties of the natural cartilages and initiate only minimal immune responses in the body environment.


2021 ◽  
Vol 8 ◽  
Author(s):  
Zuheng Liu ◽  
Haiyue Liu ◽  
Qinsheng Deng ◽  
Changqing Sun ◽  
Wangwei He ◽  
...  

Objective: To explore the relationship between dietary inflammatory index (DII) and heart failure (HF) in participants with cardiovascular and cerebrovascular diseases.Methods: NHANES (1998–2018) data were collected and used to assess the association of HF with DII. Twenty-four-hour dietary consumptions were used to calculate the scores of DII. Demographic characteristics and physical and laboratory examinations were collected for the comparison between HF and non-HF groups. Logistic regression analysis and random forest analysis were performed to calculate the odds rate and determine the potential beneficial dietary components in HF.Results: A total of 19,067 cardiac-cerebral vascular disease participants were categorized as HF (n = 1,382; 7.25%) and non-HF (n = 17,685; 92.75%) groups. Heart failure participants had higher levels of DII score compared with those in the non-HF group (0.239 ± 1.702 vs. −0.145 ± 1.704, p < 0.001). Compared with individuals with T1 (DII: −3.884 to −0.570) of DII, those in T3 (DII: 1.019 to 4.598) had a higher level of total cholesterol (4.49 ± 1.16 vs. 4.75 ± 1.28 mmol/L, p < 0.01), globulin (29.92 ± 5.37 vs. 31.29 ± 5.84 g/L, p < 0.001), and pulse rate (69.90 ± 12.22 vs. 72.22 ± 12.77, p < 0.001) and lower levels of albumin (40.76 ± 3.52 vs. 39.86 ± 3.83 g/L, p < 0.001), hemoglobin (13.76 ± 1.65 vs. 13.46 ± 1.77 g/dl, p < 0.05), and hematocrit (40.83 ± 4.69 vs. 40.17 ± 5.01%, p < 0.05). The odds rates of HF for DII from the logistic regression were 1.140, 1.158, and 1.110 in models 1, 2, and 3, respectively. In addition, from the results of random forest analysis, dietary magnesium, fiber, and beta carotene may be essential in HF.Conclusion: Dietary inflammatory index was positively associated with HF in US adults, and dietary intervention might be a promising method in the therapy of HF.


Sign in / Sign up

Export Citation Format

Share Document