This study was a secondary database analysis of a prospective cohort study  in which data were collected from May of 2011 to April of 2014. The prospective cohort study involved assessment of the concept of a “comparable sign”, and was observational. Because the original design was observational and required no prospective assignment of human participants or groups of humans to one or more health-related interventions to evaluate the effects on health outcomes, clinical trials registration was not required. All patients enrolled in the study signed an informed consent statement that was approved by the Walsh University Human Ethics committee in North Canton, Ohio.
All data were gathered in one of eight outpatient physical therapy clinics in the United States. For eligibility to participate in the primary study, patients were required to be 18 years of age or older with mechanically producible cervical or lumbar spine pain which occurred during clinical examination movements. All subjects also had to have required care beyond a single visit and had to speak English. Clinicians were instructed to target consecutive patients with spinal pain for inclusion into the study.
Exclusion involved the presence of any red flag (tumor, metabolic disease, rheumatoid arthritis, osteoporosis, prolonged history of steroid use) or signs consistent with nerve root compression that resulted in a radiculopathy (i.e., diminished muscle stretch reflex, or diminished or absent sensation to pinprick in any upper or lower extremity dermatome). Additional exclusion criteria included a history of neck or low back related surgery or current pregnancy.
The study included 9 orthopedically-oriented physiotherapists, all of whom had rigorous, extensive training in manual therapy principles, orthopedic manual therapy certification, or were Fellows of the American Academy of Orthopaedic Manual Physical Therapists. The physiotherapists’ experiences ranged from 12 to 30 years (mean = 20.3 years) and practice settings were either hospital-based or private outpatient facilities. All were familiar with data collection in research projects and had experience collecting and recording data in two previous randomized controlled trials [11, 12].
The examination and interventional process
Prior to involvement, all physiotherapists participated in a standardized, mandatory 30-minute educational webinar that explained the primary purpose of the study, the data collection methods, and the requirements for participation. Physiotherapists were also made aware of the secondary purpose of the study, which was to evaluate their ability to predict the projected outcome.
All physiotherapists performed a patient response-based examination  in which feedback was gathered with each targeted active or passive movement and subsequent treatment was a by-product of what was identified during the examination. A standardized examination process was used for all patients and the process involved analyzing movement patterns and pain during the examination phases of; 1) active physiological movements, 2) passive physiological movements and 3) passive accessory movements. All data captured during the initial visit was recorded immediately after completion of the encounter.
Week two and discharge
Throughout the bout of care, treatment interventions were performed pragmatically to ensure ecological validity and almost exclusively consisted of manual therapy, strengthening, and patient-specific education. Specific interventions were not the purpose of the study thus the components of each patient’s treatment were not collected. The physiotherapists collected outcomes data for disability and pain at week two and at discharge collected these along with the rate of recovery. Patients were discharged when the physical therapist felt they had meet their maximal improvement, when the patients self-discharged, or when the two parties mutually agreed on discharge. Discharge was not delayed for the sake of the study, thus in rare cases patient-encounters were shorter than the 2 week follow up.
At baseline, each physiotherapist recorded demographics (e.g., age, race, gender, and diagnosis), duration of symptoms (categorized by acute < 6 weeks, sub-acute 6 weeks to 12 weeks, and chronic > 12 weeks), baseline outcomes measures for pain (Numeric pain rating scale) and disability (ODI or Neck Disability Index), previous history of a similar injury/sameness of symptoms(Yes or No), presence of a within-session change in pain or movement strategies (Yes for improvement, or No for no change or worsening symptoms), and presence of baseline psychosocial concerns (Yes or No).
All variables used in the modeling for this study were selected based on their previously investigated relationships with prognosis for either neck or back pain . Age has been associated with poorer prognosis for subjects with neck pain [14, 15]. Longer duration of symptoms has been associated with a poorer recovery , whereas higher intensity of baseline levels of pain and disability has been associated with delayed recovery for patients with neck pain  and low back pain [15, 18]. A previous injury has been identified as the most prominent variable associated with recurrent low back pain , for first-time low back pain , and for poor outcomes with neck and back pain [15, 18, 20, 21] although to our knowledge the similarity (sameness) of the symptoms to the previous injury has not been formally investigated. A within-session change is an improvement in the patient’s pain or movement strategy that occurs during the initial visit . A within-session change in either pain, movement, or both has been reported as a useful predictor of outcomes in previous studies [22–24].
Psychological factors have been associated with negative outcomes for subjects with neck  and back pain . Presence of baseline psychosocial concern was based on any single positive answer from seven questions associated with enjoyment of employment, presence of a relationship with spouse or partner, depression, anxiety, social support systems, relationship with work colleagues, and use of medications for an unmentioned mental health condition. The tool used was novel and was created to provide a comprehensive assessment of psychosocial problems without overly burdening the patient with multiple psychological scales. The tool has not been analyzed for reliability or validity. Any positive finding from the seven was coded as “yes” whereas negative findings were coded as “no”.
At the initial clinical encounter, each physiotherapist was asked to estimate each patient’s potential for a successful outcome, based on their professional appraisal. Operationally, the physiotherapists were instructed to evaluate all component parts of their evaluation in their prediction of prognosis for the patient. Similar to the method used by Dagfinrud and colleagues , physiotherapists were instructed to score each patient on a continuum of 1 (suggesting a very poor projected outcome) to 10 (suggesting an excellent projected outcome) during the initial assessment. Each therapist was asked to score each patient following their complete encounter with the patient, including patient history, physical examination, treatment and reassessment. Upon examining the distribution of physiotherapists’ scores and using a receiver operating curve(ROC), the physiotherapist prediction of prognosis was dichotomised as a good projected prognosis(scores that range from 7 to 10), or a poor projected prognosis (scored as 1–6).
Primary disability measures included the ODI  or the Neck Disability Index (NDI) , whereas the primary pain measure was the numeric pain rating scale (NPRS). At discharge, the self-reported Rate of Recovery (RoR) was captured .
Oswestry disability index and the neck disability index
The ODI was used to measure patient disability in the patients with back pain. The ODI is a scale of 10 questions with scoring of 0–5 for each question, and the ODI defines disability as the higher the score, the greater the disability . We used percentage change to determine the change score for each patient. This was calculated as [(baseline ODI score–final ODI score)/(baseline ODI score)] × 100 . The NDI was used for the patients with neck pain, as it was designed for measuring pain related disability in this population . The NDI contains ten focused sections. Seven items focus on activities of daily living. Each item is scored on a 6 point scale and can reach a maximum score of 5; therefore, the maximum score is 50 . Content and construct validity and reliability of the NDI have been previously shown in patients with neck pain . As with the ODI, we used percentage change to determine the change score for each patient. This was calculated as [(baseline NDI score–final NDI score)/(baseline NDI score)] × 100. Others have used a 50 % change from baseline as an appropriate discriminative threshold for disability scores in previous studies . Thus, for analysis, the percent change in ODI/NDI was dichotomised as ≥ 50 % change (successful outcome) and <50 % change (not successful).
Numeric pain rating scale
The NPRS was used for patient perception of pain intensity using a scale of 0 (“no pain”) to 10 (“worst pain imaginable”). The NPRS has been found to be reliable and responsive . We also used a percentage change as our outcome measure. This was calculated as: [(baseline NPRS score–final NPRS score)/(baseline NPRS score)] × 100. Greater than or equal to a 50 % improvement has been used by others  in different populations as an acceptable level of change indicating successful outcome. Thus for analysis, we categorized the percent change in NPRS as ≥ 50 % change (successful outcome) and <50 % change (not successful).
Rate of recovery
Self-reported rate of recovery was scored as (0–100 %) . Patients responded to the physiotherapists asking them whether they were recovered and by how much by scoring their recovery on a scale from 0 % (meaning no recovery at all) to 100 % (meaning totally recovered). This scoring procedure is a variant of the single alphanumeric evaluation, and has been previously used with patients with shoulder pain [31, 33] and low back pain . Previous work  has identified scores >82.5 % are related to global improvements in outcome. Thus, for analysis, we categorized the % recovery as ≥ 82.5 % improved (successful outcome) and < 82.5 % improvement (not successful).
Number of observations per variable
Number of observations per variable was determined by using the recommendations of Homer and Lemeshow . For simple univariate multinomial or logistic regression analyses, a minimum observation-to-variable ratio of 10 is recommended, although a number this low will likely overfit a model . For this study, only eight variables were targeted.
All analyses completed were performed using Statistical Package for the Social Sciences (SPSS), version 21.0 (IBM Corp, Armonk, NY). Intention to treat analysis was used, and for missing data at any follow-up time point, the last observation was carried forward. Descriptive statistics were used to describe the full patient sample. Frequencies of physiotherapists’ prediction of prognosis scores were evaluated for each physiotherapist to determine variations among practitioners. Linearity of effect of continuous variables was evaluated by plotting to identify potential curvilinear relationships. If curvilinear relationships were found, categories were created and were entered as ordinal data with a set of indicators (dummies). Individual estimates were then plotted to visualize linearity and checked if there are significant differences in the individual estimates.
To assess multicollinearity in the modeling and relationships among the 8 predictor variables, a correlation matrix was calculated for all independent variables. A correlational finding of r > 0.7 between independent variables was used to assess the potential of multicollinearity . Analyses of continuous measures were performed with a Pearson Product Correlation. Analyses of dichotomous or categorical measures were performed with Cramer’s V whereas analyses of continuous to dichotomous or categorical variables were performed with a Biserial correlation. Cohen  characterized a correlation of 0.10 as depicting a small relationship, a correlation of 0.30 as a moderate relationship, and a correlation of 0.50 as a large/strong relationship. P values of <0.05 were considered significant.
Distinct hierarchical logistic regression analyses were performed for each of the dependent variables: percent improvement on patient-reported rate of recovery, ODI or NDI, and NPRS. Hierarchical models were used instead of stepwise modeling because automated stepwise models may sometimes lead to potential illogical conclusions and because the modeling used in this study was exploratory. For each analysis, individual P values, odds ratios and 95 % confidence intervals, and Nagelkerke values were reported. A Nagelkerke is a pseudo R square measure that investigates the usefulness of the model . The value is similar in concept to the coefficient of determination (R2) in linear regression. The R2 statistics do not measure the goodness of fit of the model but indicate how useful the explanatory variables are in predicting the response variable and can be referred to as measures of effect size.