East Slavic indefinite pronouns: a corpus-based approach

Russian Linguistics ◽

10.1007/s11185-021-09247-0 ◽

2021 ◽

Author(s):

Yana Penkova ◽

Achim Rabus

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Statistical Methods ◽

Multinomial Logistic Regression ◽

Random Forest Analysis ◽

Functional Distribution ◽

Different Sources ◽

National Corpus ◽

Indefinite Pronouns

AbstractThe paper focuses on the development and functional distribution of indefinite pronouns in Old East Slavic, taking into account different sources, genres and registers. All the examples in the collected dataset were taken from the historical modules of the Russian National Corpus. They were tagged for type of indefinite marker, source (including originality and date), type of reference of the indefinite marker, semantics, type of discourse, and the degree of formality (formal or informal) present in the context. We then applied both descriptive and inferential statistical methods such as Random Forest analysis as well as multinomial logistic regression. Our analysis enabled us to identify the primary and secondary predictors of the choice of a particular indefinite marker and to trace the functional distribution of indefinite markers according to these factors.

Download Full-text

Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models

Laboratory Investigation ◽

10.1038/s41374-021-00662-x ◽

2021 ◽

Author(s):

Catherine H. Feng ◽

Mary L. Disis ◽

Chao Cheng ◽

Lanjing Zhang

Keyword(s):

Colorectal Cancer ◽

Logistic Regression ◽

Feature Selection ◽

Random Forest ◽

Regression Models ◽

Multinomial Logistic Regression ◽

Logistic Regression Models ◽

Selection For

Download Full-text

Genetic association tests in family samples for multi-category phenotypes

BMC Genomics ◽

10.1186/s12864-021-08107-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shuai Wang ◽

James B. Meigs ◽

Josée Dupuis

Keyword(s):

Logistic Regression ◽

Statistical Methods ◽

Error Rate ◽

Type I Error ◽

Multinomial Logistic Regression ◽

Score Test ◽

Type I ◽

Type I Error Rate ◽

Ordinal Traits ◽

Study Designs

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both types of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have a well-controlled type-I error rate, but the multinomial logistic regression has an inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power. However, because the Wald statistics rely on computer-intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.

Download Full-text

Multinomial Logistic Regression and Random Forest Classifiers in Digital Mapping of Soil Classes in Western Haiti

Revista Brasileira de Ciência do Solo ◽

10.1590/18069657rbcs20170133 ◽

2018 ◽

Vol 42 (0) ◽

Cited By ~ 3

Author(s):

Wesly Jeune ◽

Márcio Rocha Francelino ◽

Eliana de Souza ◽

Elpídio Inácio Fernandes Filho ◽

Genelício Crusoé Rocha

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Multinomial Logistic Regression ◽

Digital Mapping

Download Full-text

Genetic association tests in family samples for multi-category phenotypes

10.21203/rs.3.rs-458333/v1 ◽

2021 ◽

Author(s):

Shuai Wang ◽

James Meigs ◽

Josee Dupuis

Keyword(s):

Logistic Regression ◽

Statistical Methods ◽

Error Rate ◽

Type I Error ◽

Multinomial Logistic Regression ◽

Score Test ◽

Type I ◽

Type I Error Rate ◽

Ordinal Traits ◽

Study Designs

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both type of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of widely used efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have well-controlled type-I error rate, but the multinomial logistic regression has inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power rate. However, because the Wald statistics rely on computer intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.

Download Full-text

Statistical Analysis for Selective Identifications of VOCs by Using Surface Functionalized MoS2 Based Sensor Array

Chemistry Proceedings ◽

10.3390/csac2021-10451 ◽

2021 ◽

Vol 5 (1) ◽

pp. 35

Author(s):

Uttam Narendra Thakur ◽

Radha Bhardwaj ◽

Arnab Hazra

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Sensor Array ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Human Breath

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.

Download Full-text

Classifying Residential Electricity Demand in Mexico using Random Forest and Multinomial Logistic Regression

2019 FISE-IEEE/CIGRE Conference - Living the energy Transition (FISE/CIGRE) ◽

10.1109/fisecigre48012.2019.8984953 ◽

2019 ◽

Author(s):

Mauricio Hernandez Hernandez ◽

Dalia Patino-Echeverri

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Multinomial Logistic Regression ◽

Electricity Demand ◽

Residential Electricity

Download Full-text

Comparison of three statistical methods for earthquake-induced landslides susceptibility in Lushan region China

E3S Web of Conferences ◽

10.1051/e3sconf/202019802024 ◽

2020 ◽

Vol 198 ◽

pp. 02024

Author(s):

Rui Liu ◽

Luyao Li ◽

Zili Lai ◽

Xin Yang

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Receiver Operating Characteristic ◽

Statistical Methods ◽

Roc Curve ◽

Operating Characteristic ◽

Support Vector ◽

Susceptibility Assessment ◽

Distribution Rule

This paper adopts three models including the logistic regression (LR), support vector machine (SVM), and random forest (RF) to study the susceptibility distribution rule of susceptibility distribution of earthquakes induced landslides. The Area Under the Receiver Operating Characteristic (ROC) curve (AUC) and Ratio were used for evaluating the model’s accuracy and mapping availability susceptibility assessment. The result shows that RF has the best performance in the susceptibility assessment of earthquake-induced landslides in the Lushan region of China.

Download Full-text

Spatial prediction of WRB soil classes in an arid floodplain using multinomial logistic regression and random forest models, south-east of Iran

Arabian Journal of Geosciences ◽

10.1007/s12517-020-05576-4 ◽

2020 ◽

Vol 13 (13) ◽

Author(s):

Seyed Javad Forghani ◽

Mohammad Reza Pahlavan-Rad ◽

Mehrdad Esfandiari ◽

Ali Mohammadi Torkashvand

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Multinomial Logistic Regression ◽

Spatial Prediction ◽

Forest Models ◽

Random Forest Models ◽

East Of Iran

Download Full-text

Accelerated Discovery of the Polymer Matrix for Cartilage Repair Through Machine Learning Algorithms

10.21203/rs.3.rs-572145/v1 ◽

2021 ◽

Author(s):

A. Mairpady ◽

Abdel-Hamid I. Mourad ◽

A S Mohammad Sayem Mozumder

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Cartilage Repair ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cartilage Tissue ◽

Random Forest Regression ◽

Selection Of

Abstract Cartilage repair is one of the most challenging tasks for the orthopedic surgeons and researchers. The primary challenge lies on the fact that the development of the extracellular matrixes requires specialized cells known as chondrocytes which are sparse in numbers. Chondrocytes’ minimal self-renewal capacity makes it further troublesome and expensive to repair the cartilages. In designing successful substitutes for the cartilages, the selection of materials used for the scaffold fabrication plays the central role among several other important factors in order to ensure the success of the survival and proliferation of any biomaterial substitutes. Since last few decades, polymer and polymers' combination have been extensively used to fabricate such scaffolds and have shown promising results in terms of mechanical integrity and biocompatibility. In an empirical approach, the selection of the most appropriate polymer(s) for cartilage repair is an expensive and time-consuming affair, as traditionally, it requires numerous trials. Moreover, it is humanly impossible to go through the huge library of literature available on the potential polymer(s) and to correlate their physical, mechanical and biological properties that might be suitable for cartilage tissue engineering. With the advancement of machine learning, material design may experience a significant reduction in experimental time and cost. The objective of this study is to implement an inverse design approach to select the best polymer(s) or composites for cartilage repair by using the machine learning algorithms, such as random forest regression (i.e., regression trees) and the multinomial logistic regression. In these algorithms, the mechanical properties of the polymers, which are similar to the cartilages, are considered as the input and the polymer(s)/composites are the predicted output. According to the random forest regression and multinomial logistic regression, the polymer(s)/composites (i.e., the output) having the closest characteristics of the articular cartilages were found to be a composite of polycaprolactone and poly(bisphenol A carbonate) and a blend of polyethylene/polyethylene-graft-poly(maleic anhydride), respectively. These composites exhibit similar biomechanical properties of the natural cartilages and initiate only minimal immune responses in the body environment.

Download Full-text

Association Between Dietary Inflammatory Index and Heart Failure: Results From NHANES (1999–2018)

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.702489 ◽

2021 ◽

Vol 8 ◽

Author(s):

Zuheng Liu ◽

Haiyue Liu ◽

Qinsheng Deng ◽

Changqing Sun ◽

Wangwei He ◽

...

Keyword(s):

Heart Failure ◽

Logistic Regression ◽

Random Forest ◽

Dietary Intervention ◽

Beta Carotene ◽

Cerebrovascular Diseases ◽

Dietary Inflammatory Index ◽

Random Forest Analysis ◽

Inflammatory Index ◽

Dietary Magnesium

Objective: To explore the relationship between dietary inflammatory index (DII) and heart failure (HF) in participants with cardiovascular and cerebrovascular diseases.Methods: NHANES (1998–2018) data were collected and used to assess the association of HF with DII. Twenty-four-hour dietary consumptions were used to calculate the scores of DII. Demographic characteristics and physical and laboratory examinations were collected for the comparison between HF and non-HF groups. Logistic regression analysis and random forest analysis were performed to calculate the odds rate and determine the potential beneficial dietary components in HF.Results: A total of 19,067 cardiac-cerebral vascular disease participants were categorized as HF (n = 1,382; 7.25%) and non-HF (n = 17,685; 92.75%) groups. Heart failure participants had higher levels of DII score compared with those in the non-HF group (0.239 ± 1.702 vs. −0.145 ± 1.704, p < 0.001). Compared with individuals with T1 (DII: −3.884 to −0.570) of DII, those in T3 (DII: 1.019 to 4.598) had a higher level of total cholesterol (4.49 ± 1.16 vs. 4.75 ± 1.28 mmol/L, p < 0.01), globulin (29.92 ± 5.37 vs. 31.29 ± 5.84 g/L, p < 0.001), and pulse rate (69.90 ± 12.22 vs. 72.22 ± 12.77, p < 0.001) and lower levels of albumin (40.76 ± 3.52 vs. 39.86 ± 3.83 g/L, p < 0.001), hemoglobin (13.76 ± 1.65 vs. 13.46 ± 1.77 g/dl, p < 0.05), and hematocrit (40.83 ± 4.69 vs. 40.17 ± 5.01%, p < 0.05). The odds rates of HF for DII from the logistic regression were 1.140, 1.158, and 1.110 in models 1, 2, and 3, respectively. In addition, from the results of random forest analysis, dietary magnesium, fiber, and beta carotene may be essential in HF.Conclusion: Dietary inflammatory index was positively associated with HF in US adults, and dietary intervention might be a promising method in the therapy of HF.

Download Full-text