--> Skip to Main Content

Systematic Reviews

LibGuide to Systematic Reviews

Rating Evidence

Rating Evidence

In rating evidence, the GRADE Framework is useful for creating an evidence profile (EP) and summary of findings (SoF) table. GRADE also offers a series of criterial for judging the quality of evidence.

GRADE offers a system for rating quality of evidence in systematic reviews and guidelines and grading strength of recommendations in guidelines. [Guyatt, 383] GRADE is an acronym meaning "Grades of Recommendation, Assessment, Development, and Evaluation."

GRADE is a transparent framework that allows a reviewer to examine and rate the quality of evidence demonstrated by studies, be they Randomized Controlled Trials (RCTs) or Observational Studies. "In the GRADE approach, randomized controlled trials (RCTs) start as high-quality evidence and observational studies as low-quality evidence supporting estimates of intervention effects. Five factors may lead to rating down the quality of evidence and three factors may lead to rating up. Ultimately, the quality of evidence for each outcome falls into one of four categories from high to very low." [Guyatt, 385]

Studies are not rated as a whole using GRADE, rather each individual outcome posited by the study is Graded, so a rating is given for each outcome the study sought to measure -- either explicitly or incidentally.  Thus, a study with multiple outcomes reported could have a variety of different ratings assigned to it, one for each outcome reported.  What is most important is that the quality of the evidence for each outcome is examined and rated.

As discussed at the outset of this LibGide, fundamental to a systematic review is the development of a clinical question that can (hopefully) be answered by the evidence across a number of studies. The basic form the clinical question takes is achieved by using PICO, the patient/intervention/comparator/outcome framework.  GRADE uses PICO to establish which outcomes in a given study are the most critical, and which are less important, overall.  This latter point is important, as GRADE seeks to provide a recommendation for EACH outcome, not just for the study or systematic review as a whole. "Systematic review and guideline authors use [GRADE] to rate the quality of evidence for each outcome across studies (i.e., for a body of evidence). This does not mean rating each study as a single unit. Rather, GRADE is 'outcome centric': rating is made for each outcome, and quality may differ--indeed, is likely to differ--from one outcome to another within a single study and across a body of evidence." [Guyatt, 385] In fact, the ultimate goal of GRADE is to provide both a summary of the evidence from a given study and to rate the evidence for (or against) an outcome and an estimate of the effect of whichever intervention was used in the study.  To discover/report this GRADE uses an Evidence Profile (EP) and a Summary of Findings (SoF) table.

Evidence Profile (EP)

Also referred to as a GRADE EP, the Evidence Profile "includes a detailed quality assessment in addition to a SoFs. That is, the EP includes an explicit judgment of each factor that determines the quality of evidence for each outcome, in addition to a SoFs for each outcome. The SoF table includes an assessment of the quality of evidence for each outcome but not the detailed judgments." [Guyatt, 386]  According to the series of articles, a Summary of Findings is intended for a broader audience, including end users of systematic reviews; whereas an Evidence Profile is intended for those creating guidelines.

Fig. 2Quality assessment criteria.

Above is the grid (Figure 2. Quality Assessment Criteria) created to show the process at arriving at a judgment to determine the quality of evidence for an outcome using GRADE.  As you can see, Randomized Controlled Trials begin with the assumption that the evidence is either High or Moderate Quality, whereas Observational Studies begin with the assumption that the quality of evidence is either Low or Very Low in quality.  However, two columns in the grid show that the quality of evidence can be lowered or raised based on the review of the study process according to the five criteria mentioned before: Risk of Bias, Inconsistency, Indirectness, Imprecision, or Publication Bias.  The quality of evidence can be rated up according to the criteria in the final column, with three criteria: Large Effect, Dose Response, or All Plausible Confounding.

An Evidence Profile table is shown in the article, which can be viewed here. It relies on the same study mentioned at the outset of this LibGuide: Venekeamp's study examining the effectiveness of antibiotic use versus no intervention in children with Acute Otitis Media. 

Table 2 - Summary of Findings [Guyatt, 388] shows the Summary of Findings (SoF) with Outcomes as the first column, followed by the Control & Intervention Risk, Relative Risk (Confidence Interval), Number of Participants, Quality of Evidence (ranked by GRADE), and any comments.  The Outcomes are, per the study, Pain at 24 hours, Pain at 2-7 days, Tympanometry at one month, Tympanometry at three months, Vomiting, diarrhea, or rash -- the latter two measures Tympanometry measures being surrogates for any actual pain measurement. Based on a GRADE assessment of the quality of the evidence, the first two outcomes were rated as being High quality evidence; the remaining three were rated as Moderate. 

Any comments or reasoning for the ratings are discussed, as in Table 3 of the Guyatt article, p390, which discusses the results of a different study.  It is important to note that the SoF is another manner of presenting the information from the EP. The EP, again, being more concerned with the details of the studies and their suggested outcomes than the SoF, which simply presents the findings. For instance, the EP for the Venekamp study discusses at length the outcome related to Vomiting, Diarrhea, and Rash, as the study was concerned with antibiotic use, but not necessarily with the type of antibiotic used -- as some are more prone to cause adverse effects than others.  This concern caused the evidence for that outcome to be rated down.