Clinical Trials are constructed to statistically test a hypothesis. Once the study has been completed the results are analysed based upon the original hypothesis (primary outcome) and conclusions are drawn based upon the analysis.
It is generally accepted that the play of chance could have produced the observed results up to 1 time in 20. This is often expressed as a P value of 0.05 or less.
Where studies demonstrate that there is little likelihood that the results occurred by chance the quoted P value will be below 0.05 and the results are said to be statistically significant.
However, statistical significance may not be the same thing as clinical significance. This is because statistical differences are also reliant on the size of the population studied therefore a small difference can be statistically valid if the population size is large enough. A large difference is required if the population size is small.
For example, a 2 point difference on a 60 point depression rating scale was found to be statistically significant in comparing escitalopram (cipralex) to citalopram (cipramil). In a clinical setting detection of this 2 point difference would be virtually impossible.
Focussing on secondary endpoints has already been covered, but what do trial investigators do if both the primary and secondary endpoints are unconvincing?
Data dredging may produce a statistically more convincing result. Data dredging is the process of analysing all the trial data looking for outcomes that are statistically significant; it produces post-hoc or tertiary endpoints.
These post-hoc endpoints must be treated with caution as the study was not set up to directly collect, examine and answer any questions relating to this composite data group. The statistically acceptable level for a result occurring by chance is 1 in 20. This means that if we conduct 20 post-hoc analyses one may have been statistically significant purely by chance.
For example, a post-hoc analysis of the ISIS-2 study1 found that aspirin therapy was associated with increased harms in patients born to Gemini or Libra star signs. Clearly this is nonsense and confounds our conclusions.
A more recent study that made use of post-hoc analysis was ASCOT-BPLA2. As already discussed the conclusions reached in this trial should be treated with caution.
References
- ISIS-2 Collaborative Group Randomized trial of IV streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected acute myocardial infarction. Lancet 1988;2:349-360.
- Prevention of cardiovascular events with an antihypertensive regimen of amlodipine adding perindopril as required versus atenolol adding bendroflumethiazide as required, in the Anglo-Scandinavian Cardiac Outcomes Trial-Blood Pressure Lowering Arm (ASCOT-BPLA): a mulitcentre randomised controlled study. ASCOT Investigators Lancet 2005;366:895-906.
All clinical trials are established to answer a clinical question; this is called the primary endpoint. It is to be hoped that the primary endpoint is demonstrated to be statistically significant at the end of the study but this is not always the case.
When a primary endpoint is not significant there is often a tendency to focus upon other endpoints that were investigated during the study or secondary endpoints as they are called. Consideration should be given to the fact hat the study may not have been designed to answer clinical questions based upon the secondary endpoints and therefore these results should be viewed more cautiously.
For example, in the PROactive study, recently covered here, the primary endpoint was not statistically significant while the secondary endpoint of death, non-fatal myocardial infarction and non-fatal stroke was significant. However, this study was designed to assess the efficacy of pioglitazone in secondary prevention based upon a composite of death, non-fatal MI, stroke, acute coronary syndrome, leg amputation, coronary revascularisation and revascularisation of the leg. Forming conclusions based on the secondary endpoint may not valid.