Large-Scale Learning to Rank Using Boosted Decision Trees

Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.

Download Full-text

Undersampling Techniques to Re-balance Training Data for Large Scale Learning-to-Rank

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-319-12844-3_38 ◽

2014 ◽

pp. 444-457 ◽

Cited By ~ 5

Author(s):

Muhammad Ibrahim ◽

Mark Carman

Keyword(s):

Large Scale ◽

Learning To Rank ◽

Balance Training ◽

Training Data ◽

Large Scale Learning

Download Full-text

Measuring performance in health care: case-mix adjustment by boosted decision trees

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2004.06.001 ◽

2004 ◽

Vol 32 (2) ◽

pp. 97-113 ◽

Cited By ~ 14

Author(s):

Anke Neumann ◽

Josiane Holstein ◽

Jean-Roger Le Gall ◽

Eric Lepage

Keyword(s):

Health Care ◽

Decision Trees ◽

Case Mix ◽

Case Mix Adjustment ◽

Boosted Decision Trees

Download Full-text

Boosted Decision Trees and Applications

EPJ Web of Conferences ◽

10.1051/epjconf/20135502004 ◽

2013 ◽

Vol 55 ◽

pp. 02004 ◽

Cited By ~ 1

Author(s):

Yann Coadou

Keyword(s):

Decision Trees ◽

Boosted Decision Trees

Download Full-text

Trained Synthetic Features in Boosted Decision Trees with an Application to Polish Bankruptcy Data

Advances in Intelligent Systems and Computing - Advances in Information and Communication ◽

10.1007/978-3-030-39442-4_23 ◽

2020 ◽

pp. 295-309

Author(s):

Thomas R. Boucher ◽

Tsitsi Msabaeka

Keyword(s):

Decision Trees ◽

Boosted Decision Trees

Download Full-text

Robust Optic Disc Localization by Large Scale Learning

Ophthalmic Medical Image Analysis - Lecture Notes in Computer Science ◽

10.1007/978-3-030-32956-3_12 ◽

2019 ◽

pp. 95-103 ◽

Cited By ~ 1

Author(s):

Shilu Jiang ◽

Zhiyuan Chen ◽

Annan Li ◽

Yunhong Wang

Keyword(s):

Optic Disc ◽

Large Scale ◽

Large Scale Learning ◽

Optic Disc Localization

Download Full-text

Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions: Machine Learning Approach (Preprint)

10.2196/preprints.17738 ◽

2020 ◽

Author(s):

Vincent Bremer ◽

Philip I Chow ◽

Burkhardt Funk ◽

Frances P Thorndike ◽

Lee M Ritterband

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Behavioral Therapy ◽

Digital Health ◽

Area Under The Curve ◽

Prediction Performance ◽

Health Interventions ◽

Drop Out ◽

Support Vector ◽

Boosted Decision Trees

BACKGROUND User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. OBJECTIVE The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. METHODS Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. RESULTS Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. CONCLUSIONS The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.

Download Full-text