Accuracy of the Patient Health Questionnaire-9 for screening to detect major depression: updated systematic review and individual participant data meta-analysis

Abstract Objective To update a previous individual participant data meta-analysis and determine the accuracy of the Patient Health Questionnaire-9 (PHQ-9), the most commonly used depression screening tool in general practice, for detecting major depression overall and by study or participant subgroups. Design Systematic review and individual participant data meta-analysis. Data sources Medline, Medline In-Process, and Other Non-Indexed Citations via Ovid, PsycINFO, Web of Science searched through 9 May 2018. Review methods Eligible studies administered the PHQ-9 and classified current major depression status using a validated semistructured diagnostic interview (designed for clinician administration), fully structured interview (designed for lay administration), or the Mini International Neuropsychiatric Interview (MINI; a brief interview designed for lay administration). A bivariate random effects meta-analytic model was used to obtain point and interval estimates of pooled PHQ-9 sensitivity and specificity at cut-off values 5-15, separately, among studies that used semistructured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual), fully structured interviews (eg, Composite International Diagnostic Interview), and the MINI. Meta-regression was used to investigate whether PHQ-9 accuracy correlated with reference standard categories and participant characteristics. Results Data from 44 503 total participants (27 146 additional from the update) were obtained from 100 of 127 eligible studies (42 additional studies; 79% eligible studies; 86% eligible participants). Among studies with a semistructured interview reference standard, pooled PHQ-9 sensitivity and specificity (95% confidence interval) at the standard cut-off value of ≥10, which maximised combined sensitivity and specificity, were 0.85 (0.79 to 0.89) and 0.85 (0.82 to 0.87), respectively. Specificity was similar across reference standards, but sensitivity in studies with semistructured interviews was 7-24% (median 21%) higher than with fully structured reference standards and 2-14% (median 11%) higher than with the MINI across cut-off values. Across reference standards and cut-off values, specificity was 0-10% (median 3%) higher for men and 0-12 (median 5%) higher for people aged 60 or older. Conclusions Researchers and clinicians could use results to determine outcomes, such as total number of positive screens and false positive screens, at different PHQ-9 cut-off values for different clinical settings using the knowledge translation tool at www.depressionscreening100.com/phq . Study registration PROSPERO CRD42014010673.

Download Full-text

Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis

BMJ ◽

10.1136/bmj.l1476 ◽

2019 ◽

pp. l1476 ◽

Cited By ~ 96

Author(s):

Brooke Levis ◽

Andrea Benedetti ◽

Brett D Thombs

Keyword(s):

Major Depression ◽

Sensitivity And Specificity ◽

Meta Analysis ◽

Patient Health Questionnaire ◽

Individual Participant Data ◽

Health Questionnaire ◽

Structured Interviews ◽

Diagnostic Interviews ◽

Patient Health ◽

Individual Participant

Abstract Objective To determine the accuracy of the Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression. Design Individual participant data meta-analysis. Data sources Medline, Medline In-Process and Other Non-Indexed Citations, PsycINFO, and Web of Science (January 2000-February 2015). Inclusion criteria Eligible studies compared PHQ-9 scores with major depression diagnoses from validated diagnostic interviews. Primary study data and study level data extracted from primary reports were synthesized. For PHQ-9 cut-off scores 5-15, bivariate random effects meta-analysis was used to estimate pooled sensitivity and specificity, separately, among studies that used semistructured diagnostic interviews, which are designed for administration by clinicians; fully structured interviews, which are designed for lay administration; and the Mini International Neuropsychiatric (MINI) diagnostic interviews, a brief fully structured interview. Sensitivity and specificity were examined among participant subgroups and, separately, using meta-regression, considering all subgroup variables in a single model. Results Data were obtained for 58 of 72 eligible studies (total n=17 357; major depression cases n=2312). Combined sensitivity and specificity was maximized at a cut-off score of 10 or above among studies using a semistructured interview (29 studies, 6725 participants; sensitivity 0.88, 95% confidence interval 0.83 to 0.92; specificity 0.85, 0.82 to 0.88). Across cut-off scores 5-15, sensitivity with semistructured interviews was 5-22% higher than for fully structured interviews (MINI excluded; 14 studies, 7680 participants) and 2-15% higher than for the MINI (15 studies, 2952 participants). Specificity was similar across diagnostic interviews. The PHQ-9 seems to be similarly sensitive but may be less specific for younger patients than for older patients; a cut-off score of 10 or above can be used regardless of age.. Conclusions PHQ-9 sensitivity compared with semistructured diagnostic interviews was greater than in previous conventional meta-analyses that combined reference standards. A cut-off score of 10 or above maximized combined sensitivity and specificity overall and for subgroups. Registration PROSPERO CRD42014010673.

Download Full-text

The Accuracy of the Patient Health Questionnaire-9 Algorithm for Screening to Detect Major Depression: An Individual Participant Data Meta-Analysis

Psychotherapy and Psychosomatics ◽

10.1159/000502294 ◽

2019 ◽

Vol 89 (1) ◽

pp. 25-37 ◽

Cited By ~ 5

Author(s):

Chen He ◽

Brooke Levis ◽

Kira E. Riehm ◽

Nazanin Saadat ◽

Alexander W. Levis ◽

...

Keyword(s):

Major Depression ◽

Meta Analysis ◽

Patient Health Questionnaire ◽

Individual Participant Data ◽

Health Questionnaire ◽

Patient Health ◽

Individual Participant

Download Full-text

Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis

BMJ ◽

10.1136/bmj.l1781 ◽

2019 ◽

pp. l1781 ◽

Cited By ~ 1

Keyword(s):

Major Depression ◽

Meta Analysis ◽

Patient Health Questionnaire ◽

Individual Participant Data ◽

Health Questionnaire ◽

Patient Health ◽

Individual Participant

Download Full-text

Accuracy of the Hospital Anxiety and Depression Scale Depression subscale (HADS-D) to screen for major depression: systematic review and individual participant data meta-analysis

BMJ ◽

10.1136/bmj.n972 ◽

2021 ◽

pp. n972

Author(s):

Yin Wu ◽

Brooke Levis ◽

Ying Sun ◽

Chen He ◽

Ankur Krishnan ◽

...

Keyword(s):

Major Depression ◽

Sensitivity And Specificity ◽

Meta Analysis ◽

Depression Scale ◽

Diagnostic Interview ◽

Anxiety And Depression ◽

Individual Participant Data ◽

Structured Interviews ◽

Depression Subscale ◽

Individual Participant

AbstractObjectiveTo evaluate the accuracy of the depression subscale of the Hospital Anxiety and Depression Scale (HADS-D) to screen for major depression among people with physical health problems.DesignSystematic review and individual participant data meta-analysis.Data sourcesMedline, Medline In-Process and Other Non-Indexed Citations, PsycInfo, and Web of Science (from inception to 25 October 2018).Review methodsEligible datasets included HADS-D scores and major depression status based on a validated diagnostic interview. Primary study data and study level data extracted from primary reports were combined. For HADS-D cut-off thresholds of 5-15, a bivariate random effects meta-analysis was used to estimate pooled sensitivity and specificity, separately, in studies that used semi-structured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders), fully structured interviews (eg, Composite International Diagnostic Interview), and the Mini International Neuropsychiatric Interview. One stage meta-regression was used to examine whether accuracy was associated with reference standard categories and the characteristics of participants. Sensitivity analyses were done to assess whether including published results from studies that did not provide raw data influenced the results.ResultsIndividual participant data were obtained from 101 of 168 eligible studies (60%; 25 574 participants (72% of eligible participants), 2549 with major depression). Combined sensitivity and specificity was maximised at a cut-off value of seven or higher for semi-structured interviews, fully structured interviews, and the Mini International Neuropsychiatric Interview. Among studies with a semi-structured interview (57 studies, 10 664 participants, 1048 with major depression), sensitivity and specificity were 0.82 (95% confidence interval 0.76 to 0.87) and 0.78 (0.74 to 0.81) for a cut-off value of seven or higher, 0.74 (0.68 to 0.79) and 0.84 (0.81 to 0.87) for a cut-off value of eight or higher, and 0.44 (0.38 to 0.51) and 0.95 (0.93 to 0.96) for a cut-off value of 11 or higher. Accuracy was similar across reference standards and subgroups and when published results from studies that did not contribute data were included.ConclusionsWhen screening for major depression, a HADS-D cut-off value of seven or higher maximised combined sensitivity and specificity. A cut-off value of eight or higher generated similar combined sensitivity and specificity but was less sensitive and more specific. To identify medically ill patients with depression with the HADS-D, lower cut-off values could be used to avoid false negatives and higher cut-off values to reduce false positives and identify people with higher symptom levels.Trial registrationPROSPERO CRD42015016761.

Download Full-text

Patient Health Questionnaire-9 scores do not accurately estimate depression prevalence: individual participant data meta-analysis

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2020.02.002 ◽

2020 ◽

Vol 122 ◽

pp. 115-128.e1 ◽

Cited By ~ 18

Author(s):

Brooke Levis ◽

Andrea Benedetti ◽

John P.A. Ioannidis ◽

Ying Sun ◽

Zelalem Negeri ◽

...

Keyword(s):

Meta Analysis ◽

Patient Health Questionnaire ◽

Individual Participant Data ◽

Health Questionnaire ◽

Depression Prevalence ◽

Patient Health ◽

Individual Participant

Download Full-text

Accuracy of the Edinburgh Postnatal Depression Scale (EPDS) for screening to detect major depression among pregnant and postpartum women: systematic review and meta-analysis of individual participant data

BMJ ◽

10.1136/bmj.m4022 ◽

2020 ◽

pp. m4022

Author(s):

Brooke Levis ◽

Zelalem Negeri ◽

Ying Sun ◽

Andrea Benedetti ◽

Brett D Thombs

Keyword(s):

Major Depression ◽

Sensitivity And Specificity ◽

Postnatal Depression ◽

Edinburgh Postnatal Depression Scale ◽

Meta Analysis ◽

Depression Scale ◽

Postpartum Women ◽

Individual Participant Data ◽

Diagnostic Interviews ◽

Individual Participant

Abstract Objective To evaluate the Edinburgh Postnatal Depression Scale (EPDS) for screening to detect major depression in pregnant and postpartum women. Design Individual participant data meta-analysis. Data sources Medline, Medline In-Process and Other Non-Indexed Citations, PsycINFO, and Web of Science (from inception to 3 October 2018). Eligibility criteria for selecting studies Eligible datasets included EPDS scores and major depression classification based on validated diagnostic interviews. Bivariate random effects meta-analysis was used to estimate EPDS sensitivity and specificity compared with semi-structured, fully structured (Mini International Neuropsychiatric Interview (MINI) excluded), and MINI diagnostic interviews separately using individual participant data. One stage meta-regression was used to examine accuracy by reference standard categories and participant characteristics. Results Individual participant data were obtained from 58 of 83 eligible studies (70%; 15 557 of 22 788 eligible participants (68%), 2069 with major depression). Combined sensitivity and specificity was maximised at a cut-off value of 11 or higher across reference standards. Among studies with a semi-structured interview (36 studies, 9066 participants, 1330 with major depression), sensitivity and specificity were 0.85 (95% confidence interval 0.79 to 0.90) and 0.84 (0.79 to 0.88) for a cut-off value of 10 or higher, 0.81 (0.75 to 0.87) and 0.88 (0.85 to 0.91) for a cut-off value of 11 or higher, and 0.66 (0.58 to 0.74) and 0.95 (0.92 to 0.96) for a cut-off value of 13 or higher, respectively. Accuracy was similar across reference standards and subgroups, including for pregnant and postpartum women. Conclusions An EPDS cut-off value of 11 or higher maximised combined sensitivity and specificity; a cut-off value of 13 or higher was less sensitive but more specific. To identify pregnant and postpartum women with higher symptom levels, a cut-off of 13 or higher could be used. Lower cut-off values could be used if the intention is to avoid false negatives and identify most patients who meet diagnostic criteria. Registration PROSPERO (CRD42015024785).

Download Full-text

Selective cutoff reporting in studies of the accuracy of the Patient Health Questionnaire‐9 and Edinburgh Postnatal Depression Scale: Comparison of results based on published cutoffs versus all cutoffs using individual participant data meta‐analysis

International Journal of Methods in Psychiatric Research ◽

10.1002/mpr.1873 ◽

2021 ◽

Author(s):

Dipika Neupane ◽

Brooke Levis ◽

Parash M. Bhandari ◽

Brett D. Thombs ◽

Andrea Benedetti ◽

...

Keyword(s):

Postnatal Depression ◽

Edinburgh Postnatal Depression Scale ◽

Meta Analysis ◽

Depression Scale ◽

Patient Health Questionnaire ◽

Individual Participant Data ◽

Health Questionnaire ◽

Comparison Of Results ◽

Patient Health ◽

Individual Participant

Download Full-text

Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9: a systematic review and individual participant data meta-analysis

Psychological Medicine ◽

10.1017/s0033291719001314 ◽

2019 ◽

Vol 50 (8) ◽

pp. 1368-1380 ◽

Cited By ~ 8

Author(s):

Yin Wu ◽

Brooke Levis ◽

Kira E. Riehm ◽

Nazanin Saadat ◽

Alexander W. Levis ◽

...

Keyword(s):

Major Depression ◽

Diagnostic Accuracy ◽

Meta Analysis ◽

Diagnostic Interview ◽

Cutoff Score ◽

Random Effects Models ◽

Patient Health ◽

Individual Participant ◽

Self Harm ◽

Sensitivity Specificity

AbstractBackgroundItem 9 of the Patient Health Questionnaire-9 (PHQ-9) queries about thoughts of death and self-harm, but not suicidality. Although it is sometimes used to assess suicide risk, most positive responses are not associated with suicidality. The PHQ-8, which omits Item 9, is thus increasingly used in research. We assessed equivalency of total score correlations and the diagnostic accuracy to detect major depression of the PHQ-8 and PHQ-9.MethodsWe conducted an individual patient data meta-analysis. We fit bivariate random-effects models to assess diagnostic accuracy.Results16 742 participants (2097 major depression cases) from 54 studies were included. The correlation between PHQ-8 and PHQ-9 scores was 0.996 (95% confidence interval 0.996 to 0.996). The standard cutoff score of 10 for the PHQ-9 maximized sensitivity + specificity for the PHQ-8 among studies that used a semi-structured diagnostic interview reference standard (N = 27). At cutoff 10, the PHQ-8 was less sensitive by 0.02 (−0.06 to 0.00) and more specific by 0.01 (0.00 to 0.01) among those studies (N = 27), with similar results for studies that used other types of interviews (N = 27). For all 54 primary studies combined, across all cutoffs, the PHQ-8 was less sensitive than the PHQ-9 by 0.00 to 0.05 (0.03 at cutoff 10), and specificity was within 0.01 for all cutoffs (0.00 to 0.01).ConclusionsPHQ-8 and PHQ-9 total scores were similar. Sensitivity may be minimally reduced with the PHQ-8, but specificity is similar.

Download Full-text

Overestimation of Postpartum Depression Prevalence Based on a 5-item Version of the EPDS: Systematic Review and Individual Participant Data Meta-analysis

The Canadian Journal of Psychiatry ◽

10.1177/0706743720934959 ◽

2020 ◽

Vol 65 (12) ◽

pp. 835-844

Author(s):

Brett D. Thombs ◽

Brooke Levis ◽

Anita Lyubenova ◽

Dipika Neupane ◽

Zelalem Negeri ◽

...

Keyword(s):

Major Depression ◽

Postpartum Depression ◽

Maternal Mental Health ◽

Meta Analysis ◽

Depression Scale ◽

Diagnostic Interview ◽

Data Sets ◽

Individual Participant Data ◽

Depression Prevalence ◽

Individual Participant

Objective: The Maternal Mental Health in Canada, 2018/2019, survey reported that 18% of 7,085 mothers who recently gave birth reported “feelings consistent with postpartum depression” based on scores ≥7 on a 5-item version of the Edinburgh Postpartum Depression Scale (EPDS-5). The EPDS-5 was designed as a screening questionnaire, not to classify disorders or estimate prevalence; the extent to which EPDS-5 results reflect depression prevalence is unknown. We investigated EPDS-5 ≥7 performance relative to major depression prevalence based on a validated diagnostic interview, the Structured Clinical Interview for DSM (SCID). Methods: We searched Medline, Medline In-Process & Other Non-Indexed Citations, PsycINFO, and the Web of Science Core Collection through June 2016 for studies with data sets with item response data to calculate EPDS-5 scores and that used the SCID to ascertain depression status. We conducted an individual participant data meta-analysis to estimate pooled percentage of EPDS-5 ≥7, pooled SCID major depression prevalence, and the pooled difference in prevalence. Results: A total of 3,958 participants from 19 primary studies were included. Pooled prevalence of SCID major depression was 9.2% (95% confidence interval [CI] 6.0% to 13.7%), pooled percentage of participants with EPDS-5 ≥7 was 16.2% (95% CI 10.7% to 23.8%), and pooled difference was 8.0% (95% CI 2.9% to 13.2%). In the 19 included studies, mean and median ratios of EPDS-5 to SCID prevalence were 2.1 and 1.4 times. Conclusions: Prevalence estimated based on EPDS-5 ≥7 appears to be substantially higher than the prevalence of major depression. Validated diagnostic interviews should be used to establish prevalence.

Download Full-text

Probability of Major Depression Classification Based on the SCID, CIDI, and MINI Diagnostic Interviews: A Synthesis of Three Individual Participant Data Meta-Analyses

Psychotherapy and Psychosomatics ◽

10.1159/000509283 ◽

2020 ◽

Vol 90 (1) ◽

pp. 28-40 ◽

Cited By ~ 1

Author(s):

Yin Wu ◽

Brooke Levis ◽

John P.A. Ioannidis ◽

Andrea Benedetti ◽

Brett D. Thombs ◽

...

Keyword(s):

Major Depression ◽

Symptom Severity ◽

Meta Analysis ◽

Composite International Diagnostic Interview ◽

Diagnostic Interview ◽

Depression Symptom ◽

Individual Participant Data ◽

Diagnostic Interviews ◽

Individual Participant ◽

Meta Analyses

Introduction: Three previous individual participant data meta-analyses (IPDMAs) reported that, compared to the Structured Clinical Interview for the DSM (SCID), alternative reference standards, primarily the Composite International Diagnostic Interview (CIDI) and the Mini International Neuropsychiatric Interview (MINI), tended to misclassify major depression status, when controlling for depression symptom severity. However, there was an important lack of precision in the results. Objective: To compare the odds of the major depression classification based on the SCID, CIDI, and MINI. Methods: We included and standardized data from 3 IPDMA databases. For each IPDMA, separately, we fitted binomial generalized linear mixed models to compare the adjusted odds ratios (aORs) of major depression classification, controlling for symptom severity and characteristics of participants, and the interaction between interview and symptom severity. Next, we synthesized results using a DerSimonian-Laird random-effects meta-analysis. Results: In total, 69,405 participants (7,574 [11%] with major depression) from 212 studies were included. Controlling for symptom severity and participant characteristics, the MINI (74 studies; 25,749 participants) classified major depression more often than the SCID (108 studies; 21,953 participants; aOR 1.46; 95% confidence interval [CI] 1.11–1.92]). Classification odds for the CIDI (30 studies; 21,703 participants) and the SCID did not differ overall (aOR 1.19; 95% CI 0.79–1.75); however, as screening scores increased, the aOR increased less for the CIDI than the SCID (interaction aOR 0.64; 95% CI 0.52–0.80). Conclusions: Compared to the SCID, the MINI classified major depression more often. The odds of the depression classification with the CIDI increased less as symptom levels increased. Interpretation of research that uses diagnostic interviews to classify depression should consider the interview characteristics.

Download Full-text