Early Prediction of University Dropouts – A Random Forest Approach

2020 ◽  
Vol 240 (6) ◽  
pp. 743-789 ◽  
Author(s):  
Andreas Behr ◽  
Marco Giese ◽  
Herve D. Teguim K ◽  
Katja Theune

AbstractWe predict university dropout using random forests based on conditional inference trees and on a broad German data set covering a wide range of aspects of student life and study courses. We model the dropout decision as a binary classification (graduate or dropout) and focus on very early prediction of student dropout by stepwise modeling students’ transition from school (pre-study) over the study-decision phase (decision phase) to the first semesters at university (early study phase). We evaluate how predictive performance changes over the three models, and observe a substantially increased performance when including variables from the first study experiences, resulting in an AUC (area under the curve) of 0.86. Important predictors are the final grade at secondary school, and also determinants associated with student satisfaction and their subjective academic self-concept and self-assessment. A direct outcome of this research is the provision of information to universities wishing to implement early warning systems and more personalized counseling services to support students at risk of dropping out during an early stage of study.

2003 ◽  
Vol 9 (4) ◽  
pp. 300-307 ◽  
Author(s):  
Gyles Glover

Since the start of the National Health Service, data have been collected on admissions to psychiatric in-patient units, first as the Mental Health Enquiry, then as part of Hospital Episode Statistics. Some details have changed but many have stayed remarkably consistent. Published literature on the wide range of research and policy work undertaken using this data source is reviewed. Early work was central to the government's deinstitutionalisation policy in the early 1960s. Subsequent studies cover a wide range of epidemiological and health services research issues. A new statistical base, the Mental Health Minimum Data Set, covering individuals receiving all types of health care is currently being set up. This will supplement (but not replace) admission statistics.


2020 ◽  
Vol 8 (6) ◽  
pp. 1623-1630

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.


2017 ◽  
Author(s):  
Yuanheng Li ◽  
Björn C. Rall ◽  
Gregor Kalinkat

AbstractEmpirical feeding studies where density-dependent consumption rates are fitted to functional response models are often used to parametrize the interaction strengths in models of population or food-web dynamics. However, the relationship between functional response parameter estimates from short-term feeding studies and real-world, long-term, trophic interaction strengths remains largely untested. In a critical first step to address this void, we tested for systematic effects of experimental duration and predator satiation on the estimation of functional response parameters, namely attack rate and handling time. Analyzing a large data set covering a wide range of predator taxonomies and body sizes we show that attack rates decrease with increasing experimental duration, and that handling times of starved predators are consistently shorter than those of satiated predators. Therefore, both the experimental duration and the predator satiation level have a strong and systematic impact on the predictions of population dynamics and food-web stability. Our study highlights potential pitfalls at the intersection of empirical and theoretical applications of functional responses. We conclude our study with some practical suggestions how these implications should be addressed in the future to improve predictive abilities and realism in models of predator-prey interactions.


The image processing of microstructure for design, measure and control of metal processing has been emerging as a new area of research for advancement towards the development of Industry 4.0 framework. However, exact steel phase segmentation is the key challenge for phase identification and quantification in microstructure employing proper image processing tool. In this article, we report effectiveness of a region based segmentation tool, Chan-Vese in phase segmentation task from a ferrite- pearlite steel microstructure captured in scanning electron microscopy image (SEM) image. The algorithm has been applied on microstructure images and the results are discussed in light of the effectiveness of Chan-Vese algorithms on microstructure image processing and phase segmentation application. Experiments on the ferrite perlite microstructure data set covering a wide range of resolution revealed that the Chan-Vese algorithm is efficient in segmentation of phase region and predicting the grain boundary.


The image processing of microstructure for design, measure and control of metal processing has been emerging as a new area of research for advancement towards the development of Industry 4.0 framework. However, exact steel phase segmentation is the key challenge for phase identification and quantification in microstructure employing proper image processing tool. In this article, we report effectiveness of a region based segmentation tool, Chan-Vese in phase segmentation task from a ferrite- pearlite steel microstructure captured in scanning electron microscopy image (SEM) image. The algorithm has been applied on microstructure images and the results are discussed in light of the effectiveness of Chan-Vese algorithms on microstructure image processing and phase segmentation application. Experiments on the ferrite perlite microstructure data set covering a wide range of resolution revealed that the Chan-Vese algorithm is efficient in segmentation of phase region and predicting the grain boundary.


2016 ◽  
Vol 4 (2) ◽  
pp. 94-115 ◽  
Author(s):  
Patrick Kampkötter ◽  
Jens Mohrenweiser ◽  
Dirk Sliwka ◽  
Susanne Steffes ◽  
Stefanie Wolter

Purpose – The purpose of this paper is to introduce a new data source available for researchers with interest in human resources management (HRM) and personnel economics, the Linked Personnel Panel (LPP). Design/methodology/approach – The LPP is a longitudinal and representative employer-employee data set covering establishments in Germany and a subset of their workforce and is designed for quantitative empirical human resource research. Findings – The LPP employee survey applies a number of established scales to measure job characteristics and job perceptions, personal characteristics, employee attitudes towards the organization and employee behaviour. This paper gives an overview of both the employer and employee survey and outlines the definitions, origins, and statistical properties of the scales used in the individual questionnaire. Practical implications – The paper describes how researchers can access the data. Originality/value – First, the data set combines employer and employee surveys that can be matched to each other. Second, it can also be linked to a number of additional administrative data sets. Third, the LPP covers a wide range of firms and workers from different backgrounds. Finally, because of its longitudinal dimension, the LPP should facilitate the study of causal effects of HRM practices.


2019 ◽  
Vol 16 (7) ◽  
pp. 808-817 ◽  
Author(s):  
Laxmi Banjare ◽  
Sant Kumar Verma ◽  
Akhlesh Kumar Jain ◽  
Suresh Thareja

Background: In spite of the availability of various treatment approaches including surgery, radiotherapy, and hormonal therapy, the steroidal aromatase inhibitors (SAIs) play a significant role as chemotherapeutic agents for the treatment of estrogen-dependent breast cancer with the benefit of reduced risk of recurrence. However, due to greater toxicity and side effects associated with currently available anti-breast cancer agents, there is emergent requirement to develop target-specific AIs with safer anti-breast cancer profile. Methods: It is challenging task to design target-specific and less toxic SAIs, though the molecular modeling tools viz. molecular docking simulations and QSAR have been continuing for more than two decades for the fast and efficient designing of novel, selective, potent and safe molecules against various biological targets to fight the number of dreaded diseases/disorders. In order to design novel and selective SAIs, structure guided molecular docking assisted alignment dependent 3D-QSAR studies was performed on a data set comprises of 22 molecules bearing steroidal scaffold with wide range of aromatase inhibitory activity. Results: 3D-QSAR model developed using molecular weighted (MW) extent alignment approach showed good statistical quality and predictive ability when compared to model developed using moments of inertia (MI) alignment approach. Conclusion: The explored binding interactions and generated pharmacophoric features (steric and electrostatic) of steroidal molecules could be exploited for further design, direct synthesis and development of new potential safer SAIs, that can be effective to reduce the mortality and morbidity associated with breast cancer.


Author(s):  
Varun Sapra ◽  
M.L Saini ◽  
Luxmi Verma

Background: Cardiovascular diseases are increasing at an alarming rate with very high rate of mortality. Coronary artery disease is one of the type of cardiovascular disease, which is not easily diagnosed in its early stage. Prevention of Coronary Artery Disease is possible only if it is diagnosed, at early stage and proper medication is done. Objective: An effective diagnosis model is important not only for the early diagnosis but also to check the severity of the disease. Method: In this paper, a hybrid approach is followed, with the integration of deep learning (multi-layer perceptron) with Case based reasoning to design analytical framework. This paper suggests two phases of the study, one in which the patient is diagnosed for Coronary artery disease and in second phase, if the patient is suffering from the disease then employing Case based reasoning to diagnose the severity of the disease. In the first phase, multilayer perceptron is implemented on reduced dataset and with time-based learning for stochastic gradient descent respectively. Results: The classification accuracy is increase by 4.18 % with reduced data set using deep neural network with time based learning. In second phase, if the patient is diagnosed as positive for Coronary artery disease, then it triggers the Case based reasoning system to retrieve from the case base, the most similar case to predict the severity for that patient. The CBR model achieved 97.3% accuracy. Conclusion: The model can be very useful for medical practitioners as a supporting decision system and thus can save the patients from unnecessary medical expenses on costly tests and can improve the quality and effectiveness of medical treatment.


Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Micromachines ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 72 ◽  
Author(s):  
Da-Quan Yang ◽  
Bing Duan ◽  
Xiao Liu ◽  
Ai-Qiang Wang ◽  
Xiao-Gang Li ◽  
...  

The ability to detect nanoscale objects is particular crucial for a wide range of applications, such as environmental protection, early-stage disease diagnosis and drug discovery. Photonic crystal nanobeam cavity (PCNC) sensors have attracted great attention due to high-quality factors and small-mode volumes (Q/V) and good on-chip integrability with optical waveguides/circuits. In this review, we focus on nanoscale optical sensing based on PCNC sensors, including ultrahigh figure of merit (FOM) sensing, single nanoparticle trapping, label-free molecule detection and an integrated sensor array for multiplexed sensing. We believe that the PCNC sensors featuring ultracompact footprint, high monolithic integration capability, fast response and ultrahigh sensitivity sensing ability, etc., will provide a promising platform for further developing lab-on-a-chip devices for biosensing and other functionalities.


Sign in / Sign up

Export Citation Format

Share Document