These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.


PUBMED FOR HANDHELDS

Search MEDLINE/PubMed


  • Title: Inappropriate use of statistical power.
    Author: Fraser RA.
    Journal: Bone Marrow Transplant; 2023 May; 58(5):474-477. PubMed ID: 36869191.
    Abstract:
    We are pleased to add this typescript, Inappropriate use of statistical power by Raphael Fraser to the BONE MARROW TRANSPLANTATION Statistics Series. The authour discusses how we sometimes misuse statistical analyses after a study is completed and analyzed to explain the results. The most egregious example is post hoc power calculations.When the conclusion of an observational study or clinical trial is negative, namely, the data observed (or more extreme data) fail to reject the null hypothesis, people often argue for calculating the observed statistical power. This is especially true of clinical trialists believing in a new therapy who wished and hoped for a favorable outcome (rejecting the null hypothesis). One is reminded of the saying from Benjamin Franklin: A man convinced against his will is of the same opinion still.As the authour notes, when we face a negative conclusion of a clinical trial there are two possibilities: (1) there is no treatment effect; or (2) we made a mistake. By calculating the observed power after the study, people (incorrectly) believe if the observed power is high there is strong support for the null hypothesis. However, the problem is usually the opposite: if the observed power is low, the null hypothesis was not rejected because there were too few subjects. This is usually couched in terms such as: there was a trend towards… or we failed to detect a benefit because we had too few subjects or the like. Observed power should not be used to interpret results of a negative study. Put more strongly, observed power should not be calculated after a study is completed and analyzed. The power of the study to reject or not the null hypothesis is already incorporated in the calculation of the p value.The authour use interesting analogies to make important points about hypothesis testing. Testing the null hypothesis is like a jury trial. The jury can find the plaintiff guilty or not guilty. They cannot find him innocent. It is always important to recall failure to reject the null hypothesis does not mean the null hypothesis is true, simply there are insufficient evidence (data) to reject it. As the author notes: In a sense, hypothesis testing is like world championship boxing where the null hypothesis is the champion until defeated by the challenger, the alternative hypothesis, to become the new world champion.The authour include a discussion of what is a p-value, a topic we discussed before in this series and elsewhere [1, 2]. Finally, there is a nice discussion of confidence intervals (frequentist) and credibility limits (Bayesian). A frequentist interpretation views probability as the limit of the relative frequency of an event after many trials. In contrast, a Bayesian interpretation views probability in the context of a degree of belief in an event . This belief could be based on prior knowledge such as the results of previous trials, biological plausibility or personal beliefs (my drug is better than your drug). The important point is the common mis-interpretation of confidence intervals. For example, many researchers interpret a 95 percent confidence interval to mean there is a 95 percent chance this interval contains the parameter value. This is wrong. It means, if we repeat the identical study many times 95 percent of the intervals will contain the true but unknown parameter in the population. This will seem strange to many people because we are interested only in the study we are analyzing, not in repeating the same study-design many times.We hope readers will enjoy this well-written summary of common statistical errors, especially post hoc calculations of observed power. Going forth we hope to ban statements like there was a trend towards… or we failed to detect a benefit because we had too few subjects from the Journal. Reviewers have been advised. Proceed at your own risk. Robert Peter Gale MD, PhD, DSc(hc), FACP, FRCP, FRCPI(hon), FRSM, Imperial College London, Mei-Jie Zhang PhD, Medical College of Wisconsin.
    [Abstract] [Full Text] [Related] [New Search]