scholarly journals Choosing alternatives: Using Bayesian Networks and memory-based learning to study the dative alternation

2013 ◽  
Vol 9 (2) ◽  
pp. 227-262 ◽  
Author(s):  
Daphne Theijssen ◽  
Louis ten Bosch ◽  
Lou Boves ◽  
Bert Cranen ◽  
Hans van Halteren

AbstractIn existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian Network, we use the higher-level semantic features suggested in the literature, while we limit ourselves to lexical items in the memory-based approach. We evaluate the suitability of the three approaches by applying them to a large data set (>11,000 instances) extracted from the British National Corpus, and comparing their quality in terms of classification accuracy, their interpretability in the context of linguistic research, and their actual classification of individual cases. Our main finding is that the classifications are very similar across the three approaches, also when employing lexical items instead of the higher-level features, because most of the alternation is determined by the verb and the length of the two objects (here: her and the apple).

2021 ◽  
pp. 107110072110581
Author(s):  
Wenye Song ◽  
Naohiro Shibuya ◽  
Daniel C. Jupiter

Background: Ankle fractures in patients with diabetes mellitus have long been recognized as a challenge to practicing clinicians. Ankle fracture patients with diabetes may experience prolonged healing, higher risk of hardware failure, an increased risk of wound dehiscence and infection, and higher pain scores pre- and postoperatively, compared to patients without diabetes. However, the duration of opioid use among this patient cohort has not been previously evaluated. The purpose of this study is to retrospectively compare the time span of opioid utilization between ankle fracture patients with and without diabetes mellitus. Methods: We conducted a retrospective cohort study using our institution’s TriNetX database. A total of 640 ankle fracture patients were included in the analysis, of whom 73 had diabetes. All dates of opioid use for each patient were extracted from the data set, including the first and last date of opioid prescription. Descriptive analysis and logistic regression models were employed to explore the differences in opioid use between patients with and without diabetes after ankle fracture repair. A 2-tailed P value of .05 was set as the threshold for statistical significance. Results: Logistic regression models revealed that patients with diabetes are less likely to stop using opioids within 90 days, or within 180 days, after repair compared to patients without diabetes. Female sex, neuropathy, and prefracture opioid use are also associated with prolonged opioid use after ankle fracture repair. Conclusion: In our study cohort, ankle fracture patients with diabetes were more likely to require prolonged opioid use after fracture repair. Level of Evidence: Level III, prognostic.


Sensors ◽  
2020 ◽  
Vol 20 (2) ◽  
pp. 530 ◽  
Author(s):  
Tuba Yilmaz

Open-ended coaxial probes can be used as tissue characterization devices. However, the technique suffers from a high error rate. To improve this technology, there is a need to decrease the measurement error which is reported to be more than 30% for an in vivo measurement setting. This work investigates the machine learning (ML) algorithms’ ability to decrease the measurement error of open-ended coaxial probe techniques to enable tissue characterization devices. To explore the potential of this technique as a tissue characterization device, performances of multiclass ML algorithms on collected in vivo rat hepatic tissue and phantom dielectric property data were evaluated. Phantoms were used for investigating the potential of proliferating the data set due to difficulty of in vivo data collection from tissues. The dielectric property measurements were collected from 16 rats with hepatic anomalies, 8 rats with healthy hepatic tissues, and in house phantoms. Three ML algorithms, k-nearest neighbors (kNN), logistic regression (LR), and random forests (RF) were used to classify the collected data. The best performance for the classification of hepatic tissues was obtained with 76% accuracy using the LR algorithm. The LR algorithm performed classification with over 98% accuracy within the phantom data and the model generalized to in vivo dielectric property data with 48% accuracy. These findings indicate first, linear models, such as logistic regression, perform better on dielectric property data sets. Second, ML models fitted to the data collected from phantom materials can partly generalize to in vivo dielectric property data due to the discrepancy between dielectric property variability.


1999 ◽  
Vol 62 (6) ◽  
pp. 601-609 ◽  
Author(s):  
LANCE F. BOLTON ◽  
JOSEPH F. FRANK

The objective of this study was to define combinations of pH, salt, and moisture that produce growth, stasis, or inactivation of Listeria monocytogenes in Mexican-style cheese. A soft, directly acidified, rennet-coagulated, fresh cheese similar to Mexican-style cheese was produced. The cheese was subsequently altered in composition as required by the experimental protocol. A factorial design with four moisture contents (42, 50, 55, and 60%), four salt concentrations (2.0, 4.0, 6.0, and 8.0% wt/wt), six pH levels (5.0, 5.25, 5.50, 5.75, 6.0, and 6.5), and three replications was used. Observations of growth, stasis, or death were obtained for each combination after 21 and 42 days of incubation at 10°C. Binary logistic regression was used to develop an equation to determine the probability of growth or no growth for any combination within the range of the data set. In addition, ordinal logistic regression was used to calculate proportional odds ratios for growth, stasis, and death for each treatment combination. Ordinal logistic regression was also used to develop equations to determine the probability of growth, stasis, and death for formulations within the range of the data set. Models were validated with independently produced data. Of 60 samples formulated to have a 5% probability of Listeria growth (pH, 5.0 to 6.0; brine concentration, 8.17 to 16.00%), none supported growth. Of 30 samples formulated to have 50% probability of growth using the binary model (pH, 5.50 to 6.50; brine concentration, 3.23 to 12.50%), 20 supported growth. Of 30 samples formulated to have a 50% probability of growth according to the ordinal model (pH, 5.50 to 6.50; brine concentration, 3.37 to 10.90%), 16 supported growth. These data indicate that the logistic regression models presented accurately predict the behavior of L. monocytogenes in Mexican-style cheese.


2014 ◽  
Vol 104 (7) ◽  
pp. 702-714 ◽  
Author(s):  
D. A. Shah ◽  
E. D. De Wolf ◽  
P. A. Paul ◽  
L. V. Madden

Predicting major Fusarium head blight (FHB) epidemics allows for the judicious use of fungicides in suppressing disease development. Our objectives were to investigate the utility of boosted regression trees (BRTs) for predictive modeling of FHB epidemics in the United States, and to compare the predictive performances of the BRT models with those of logistic regression models we had developed previously. The data included 527 FHB observations from 15 states over 26 years. BRTs were fit to a training data set of 369 FHB observations, in which FHB epidemics were classified as either major (severity ≥ 10%) or non-major (severity < 10%), linked to a predictor matrix consisting of 350 weather-based variables and categorical variables for wheat type (spring or winter), presence or absence of corn residue, and cultivar resistance. Predictive performance was estimated on a test (holdout) data set consisting of the remaining 158 observations. BRTs had a misclassification rate of 0.23 on the test data, which was 31% lower than the average misclassification rate over 15 logistic regression models we had presented earlier. The strongest predictors were generally one of mean daily relative humidity, mean daily temperature, and the number of hours in which the temperature was between 9 and 30°C and relative humidity ≥ 90% simultaneously. Moreover, the predicted risk of major epidemics increased substantially when mean daily relative humidity rose above 70%, which is a lower threshold than previously modeled for most plant pathosystems. BRTs led to novel insights into the weather–epidemic relationship.


2015 ◽  
Vol 25 (9) ◽  
pp. 2727-2737 ◽  
Author(s):  
Nikolaos Dikaios ◽  
Jokha Alkalbani ◽  
Mohamed Abd-Alazeez ◽  
Harbir Singh Sidhu ◽  
Alex Kirkham ◽  
...  

Missing data arise major issues in the large database regarding quantitative analysis. Due to this issues, the inference of the computational process produce bias results, more damage of data, the error rate can increase, and more difficult to accomplish the process of imputation. Prediction of disguised missing data occurs in the large data sets are another major problems in real time operation. Machine learning (ML) techniques to connect with the classification of measurement to enforce the accuracy rate of predictive values. These techniques overcome the various challenges to the problem of losing data. Recent work based on the prediction of misclassification using supervised ML approach; to predict an output for an unseen input with limited parameters in a data set. When increase the size of parameter, then it generates the outcome of less accuracy rate. This article presented a new approach COBACO, an effective supervised machine learning technique. Several strategies describe the classification of predictive techniques for missing data analysis in efficient supervised machine learning techniques. The proposed predictive techniques COBACO generated more precise, accurate results than the other predictive approaches. The Experimental results obtained using both real and synthetic data set show that the proposed approach offers a valuable and promising insight to the problem of prediction of missing information.


2007 ◽  
Vol 28 (4) ◽  
pp. 382-388 ◽  
Author(s):  
Marisa Santos ◽  
José Ueleres Braga ◽  
Renato Vieira Gomes ◽  
Guilherme L. Werneck

Objective.To develop a predictive system for the occurrence of nosocomial pneumonia in patients who had cardiac surgery performed.Design.Retrospective cohort study.Setting.Two cardiologic tertiary care hospitals in Rio de Janeiro, Brazil.Patients.Between June 2000 and August 2002, there were 1,158 consecutive patients who had complex heart surgery performed. Patients older than 18 years who survived the first 48 postoperative hours were included in the study. The occurrence of pneumonia was diagnosed through active surveillance by an infectious diseases specialist according to the following criteria: the presence of new infiltrate on a radiograph in association with purulent sputum and either fever or leukocytosis until day 10 after cardiac surgery. Predictive models were built on the basis of logistic regression analysis and classification and regression tree (CART) analysis. The original data set was divided randomly into 2 parts, one used to construct the models (ie, “test sample”) and the other used for validation (ie, “validation sample”).Results.The area under the receiver–operating characteristic (ROC) curve was 69% for the logistic regression model and 76% for the CART model. Considering a probability greater than 7% to be predictive of pneumonia for both models, sensitivity was higher for the logistic regression models, compared with the CART models (64% vs 56%). However, the CART models had a higher specificity (92% vs 70%) and global accuracy (90% vs 70%) than the logistic regression models. Both models showed good performance, based on the 2-graph ROC, considering that 84.6% and 84.3% of the predictions obtained by regression and CART analyses were regarded as valid.Conclusion.Although our findings are preliminary, the predictive models we created showed fairly good specificity and fair sensitivity.


1997 ◽  
Vol 87 (1) ◽  
pp. 83-87 ◽  
Author(s):  
E. D. De Wolf ◽  
L. J. Franel

Tan spot of wheat, caused by Pyrenophora tritici-repentis, provided a model system for testing disease forecasts based on an artificial neural network. Infection periods for P. tritici-repentis on susceptible wheat cultivars were identified from a bioassay system that correlated tan spot incidence with crop growth stage and 24-h summaries of environmental data, including temperature, relative humidity, wind speed, wind direction, solar radiation, precipitation, and flat-plate resistance-type wetness sensors. The resulting data set consisted of 97 discrete periods, of which 32 were reserved for validation analysis. Neural networks with zero to nine processing elements were evaluated 20 times each to identify the model that most accurately predicted an infection event. The 200 models averaged 74 to 77% accuracy, depending on the number of processing elements and random initialization of coefficients. The most accurate model had five processing elements and correctly predicted 87% of the infection periods in the validation set. In comparison, stepwise logistic regression correctly predicted 69% of the validation cases, and multivariate discriminant analysis distinguished 50% of the validation cases. When wetness-sensor inputs were withheld from the models, both the neural network and logistic regression models declined 6% in prediction accuracy. Thus, neural networks were more accurate than statistical procedures, both with and without wetness-sensor inputs. These results demonstrate the applicability of neural networks to plant disease forecasting.


Sign in / Sign up

Export Citation Format

Share Document