# Table 2 Definitions of and standards used to interpret measurement properties

Property Definition [39] Standard for interpretation [40]
Reliability The extent to which scores for patients who have not changed are the same for repeated measurement under several conditions: e.g. over time (test-retest); by different persons on the same occasion (inter-rater) or by the same persons (i.e. raters or responders) on different occasions (intra-rater) + = ICC or weighted Kappa ≥0.70
? = Doubtful design or method (e.g. time interval not mentioned)
- = ICC or weighted Kappa <0.70, despite adequate design and method
Validity: Construct validity (hypotheses testing) The degree to which the scores of a measurement are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments or differences between relevant groups) based on the assumption that the measurement validity measures the construct to be measured + = Correlation with an instrument measuring the same construct ≥0.50 OR at least 75 % of the results are in accordance with these hypotheses AND correlation with related constructs is higher than with unrelated constructs
? = Solely correlations determined with unrelated constructs
- = Correlation with an instrument measuring the same construct <0.50 OR <75 % of the results are in accordance with the hypotheses OR correlation with related constructs is lower than with unrelated constructs
Validity: Criterion validity The degree to which the scores of a measurement are an adequate reflection of a “gold standard” + = Convincing arguments that gold standard is “gold” AND correlation with gold standard ≥0.70
? No convincing arguments that gold standard is “gold” OR doubtful design or method
- = Correlation with gold standard <0.70, despite adequate design and method
Responsiveness The ability of a measurement to detect change over time in the construct to be measured + = Correlation with an instrument measuring the same construct ≥0.50 OR at least 75 % of the results are in accordance with these hypotheses OR AUC ≥0.70 AND correlation with related constructs is higher than with unrelated constructs
? = Solely correlations determined with unrelated constructs
- = Correlation with an instrument measuring the same construct <0.50 OR <75 % of the results are in accordance with the hypotheses OR AUC <0.70 OR correlation with related constructs is lower than with unrelated constructs
1. ICC intraclass correlation coefficient, AUC area under the curve, + positive rating, ? indeterminate rating, − negative rating
2. Doubtful design or method = lacks clear description of study design or methods; used sample size smaller than 50 participants; or any important methodological flaw in study design or implementation