Study design matters
Mere mention of the subject of their article makes eyelids droop, write the authors in the journal Fertility & Sterility (F&S). True, their paper about statistics and the null hypothesis sounds like a good night lullaby. It isn’t. The article is from last year, but has not aged a bit. Setting aside the flaws in study design that the paper documents, a more recent article in the Journal of the American Medical Association highlights similar challenges. Problematic gaps between treatment recommendations and evidence from clinical trials supporting those recommendations continue to cause trouble.
Faulty study design and statistical analysis, particularly in medicine, can waste research dollars, involve research subjects for naught or even expose them to unnecessary risks, say the F&S paper authors David Meldrum and Mary Samuel of Reproductive Partners Medical Group and Kurt Barnhart from the University of Pennsylvania School of Medicine. Meldrum is editor of Fertility and Sterility.
What motivates the team to write this no-nonsense critique is to assure that “only quality manuscripts are published” and they outline what to do (WTD) and what not to do (WNTD), as they examined studies in the journals Fertility & Sterility and Human Reproduction. The offer papers in the WTD and WNTD classes. The WNTD papers have been de-identified to protect the guilty.
Among the mistakes in the WNTD papers, are defects in study design, such as a lack of calculation at the outset to determine the necessary size of patient group needed to achieve meaningful results, using the wrong statistical tests, the lack of “blinding” which leads to biasing in the interpretation of results, “dredging” for outcomes that were not part of the study design, and keeping opaque the actual number of study participants in a trial and their outcomes.
The authors suggest including statistics consultants when evaluating papers for publication, but also say that reviewers “need to be more aware of these common pitfalls and deficiencies.”
What are some of the not-to-do points? One study that the group criticizes is “extremely small,” with only around a 2 percent difference that relates to one variable in the experimental and control group. The problem lies not just with this small difference. A small difference can be a relevant result. But this study only had 15 patients in the experimental group and 35 in the control group. Statistical analysis in small groups runs into plenty of problems. For example, as in this case, the standard error, which describes the range in which the results can fall, overlaps between the experimental and control groups. That overlap makes it unlikely that study results can reach statistical significance.
The scientists kindly offer a little statistical education in their article as a kind of WTD reminder for readers. Standard error is standard deviation divided by the square root of all the observations. Performing a Student t test would have also raised a red flag. But, the study authors note, it is important to remember that the t test which looks at two sets of continuous data, needs observations that are independent from one another. In the case of the WNTD study, there were “several observations” for each patient, which would have required further analysis, all of which means the observations in this case are “clearly not independent from one another.”
Another WNTD paper the authors do not like, does not indicate how many participants were in the study. And in this case, the number of observations outnumbers the number of study participants. A big no-no in this paper, as in the previously mentioned one, is that the results were not blinded. In other words, the scientists were doing an “open-label study” and were aware both of treatments and outcomes, which can spell bias in the tracking and interpretation of findings.
For these types of open-label studies, the authors say, study size needs to be set from the beginning, with statistical analysis scheduled at a predetermined point or at the end. As the authors list all the flaws, they bemoan the lost opportunity, the “shame” of the investigation, which had the chance to enhance clinical practice with a better design.
Another flaw in two studies is the lack of registration. As the International Committee of Medical Journal Editors points out in an editorial in The New England Journal of Medicine (NEJM) , registration is voluntary. Not registering a trial means not revealing its existence.
Registration is way to gain public trust and “enhanced public confidence in the research enterprise will compensate for the costs of full disclosure,” write the NEJM authors. Patients volunteering for clinical trials “deserve to know that their contribution to improving human health will be available to inform health care decisions.” And the knowledge made possible by their collective altruism “must be accessible to everyone.”
While the authors of this F&S paper did not mention author names for the papers they criticize, they do mention the names of the researchers behind the WTD studies. They heaped praise on these investigators, for example for registering studies and revealing how the results were kept from the researchers in order to avoid bias.
The scientists like the care and dedication that characterizes the work that went into the studies they praise. One study had the writers so joyous, they wrote to the authors: “We salute you!”
Challenges in study design are not limited to the field of reproductive medicine. In the JAMA article, scientists explained that their analysis raises questions more generally about the ability of clinical trials to supply enough high quality evidence for recommendations. Trials remain small, for example, with widespread variation in procedures such as randomizing and blinding. While in some fields of medicine small-scale studies can yield useful results, the authors write that “substantial differences in the use of randomization and blinding across specialties persist after adjustment for phase, raising fundamental questions about the ability to draw reliable inferences from clinical research conducted in that arena.” Study design and the snoozy topic of statistics have powerful consequences.
Meldrum, D et al. The null hypothesis: closing the gap between good intentions and good studies Fertility and Sterility. Vol. 96, No. 1, July 2011.
Califf, Robert et al. Characteristics of Clinical Trials Registered in ClinicalTrials.gov, 2007-2010, JAMA 2012;307(17):1838-1847. doi:10.1001/jama.2012.3424.