Evidence-based medicine

How is the quality of studies assessed?

How can you find out whether a study is any good? This question may sound odd at first because it is often assumed that every study provides new and usable findings. But unfortunately it isn’t that easy: A lot of studies do not provide reliable information.

So it is important to critically evaluate every study. This can be done in a systematic review that analyzes all the available studies on a specific medical issue.

In order to assess whether the results of a study are reliable, you first have to find out why the study was done in the first place and which questions it tried to answer. This may sound trivial, but it is crucial if you want to determine whether the study can actually provide any answers to the original research question. For instance, many studies compare a new medication with a placebo (a fake medication). But if there is already an effective treatment for the medical condition in question, the new medication is usually compared with that. After all, it is important for patients to know which treatment is most likely to work.

When assessing a study, the next step is to see whether the methods used are suitable for answering the research question, whether the study was carried out properly, and whether there were any systematic errors (bias) that could distort the results of the study.

Important questions to ask when assessing the quality of a study:

  • Is the study design suitable for answering the research question? For example, you can’t use a questionnaire to find out whether a new surgical procedure is better than a tried-and-tested approach. A randomized controlled trial (RCT) is needed instead.
  • How were the participants approached and selected? Who was included in the study and who was excluded from the study? For example, people who have several medical conditions at the same time are often excluded from studies. As a result, it might not be possible to apply the study outcomes to patients in the “real world” who have several medical conditions at the same time.
  • Was the researchers’ description of how they carried out the study detailed and understandable enough for others to be able to repeat the study and verify the results?
  • Were there enough participants in the study to be able to answer the research question? When treatments are compared, there are nearly always small differences between their outcomes. Scientists then work out the likelihood that these differences could be due to chance rather than being true differences. Here it is important to know exactly how different the outcomes were and how many people participated in the study: The smaller the difference, the more participants are needed in order to consider the difference to be “real.”
  • Are the endpoints that were used in the study suitable for demonstrating that patients benefit from the treatment? In a study on a diabetes medication, for instance, measuring blood sugar levels alone would not be enough. It would be more important to know whether the medication helps prevent long-term effects of diabetes such as amputations. Laboratory values like blood sugar levels (also referred to as “surrogate parameters”) alone are not enough to provide conclusive answers.
  • Did the study last long enough? To find out whether, for example, a certain weight loss diet is effective, the participants’ weight should be checked again six months or one year after the end of the study – perhaps even after a longer period of time.
  • How many participants dropped out of the study, and why? How many participants could no longer be monitored after the end of the main part of the study (“lost to follow-up”), and why not? Good studies should include these figures and say whether they influenced the outcomes. This may be the case, for instance, if a lot of people drop out of a study due to bad side effects.
  • Apart from receiving the different treatments that were being compared in the study, were the groups treated the same otherwise? Differences are especially likely if it wasn't possible to "blind" the doctors or participants properly.
  • Was it really a fair comparison? It could be a problem, for instance, if a new medication was compared with a standard medication used at a lower dose than usual in daily practice.
  • Was the success of treatment measured in the same way in both groups? For example, if the results of a blood test were used in one group, but both a blood test and an x-ray were used in the other group, that could change the outcome.

In order to assess the quality of RCTs, the following information is also needed:

  • How were the groups randomized? Were the participants really randomly assigned to the different groups, or did something influence their selection?
  • Where possible, did the researchers make sure that neither the study participants nor the doctors knew who was in which treatment group (blinding)?
  • Did all of the participants stay in the group they were originally assigned to throughout the study? If not, it is no longer possible to make a fair comparison between the groups at the end of the study.