Odds Ratio and Yule's Q
The odds ratio is an important option for testing and quantifying the
association between two raters making dichotomous ratings. It
should probably be used more often with agreement data than it currently
The odds ratio can be understood with reference to a 2×2
Crossclassification frequencies for binary ratings
by two raters
|| a + b
|| c + d
|| a + c
|| b + d
By definition, the odds ratio, OR, is
[a/(a+b)] / [b/(a+b)]
OR = -----------------------, (1)
[c/(c+d)] / [d/(c+d)]
but this reduces to
OR = -----, (2)
or, as OR is usually calculated,
OR = ----. (3)
The last equation shows that OR is equal to the simple crossproduct
of a 2×2 table.
The concept of "odds" is familiar from gambling. For instance, one
might say the odds of a particular horse winning a race are "3 to 1";
this means the probability of the horse winning is 3 times the
probability of not winning.
In Equation (2), both the numerator and denominator are odds. The
numerator, a/b, gives the odds of a positive versus negative rating by
Rater 2 given that Rater 1's rating is positive. The denominator, c/d,
gives the odds of a positive versus negative rating by Rater 2 given
that Rater 1's rating is negative.
OR is the ratio of these two odds--hence its name, the odds
ratio. It indicates how much the odds of Rater 2 making a
positive rating increase for cases where Rater 1 makes a positive
This alone would make the odds ratio a potentially useful way to assess
association between the ratings of two raters. However, it has some
other appealing features as well. Note that:
a/b a/c d/b d/c ad
OR = ----- = ----- = ----- = ----- = ----.
c/d b/d c/a b/a bc
From this we see that the odds ratio can be interpreted in various ways.
Generally, it shows the relative increase in the odds of one rater
making a given rating, given that the other rater made the same
rating--the value is invariant regardless of whether one is
concerned with a positive or negative rating, or which rater is the
reference and which the comparison.
The odds ratio can be interpreted as a measure of the magnitude of
association between the two raters. The concept of an odds ratio is
also familiar from other statistical methods (e.g., logistic
OR can be transformed to a -1 to 1 scale by converting it to Yule's Q
(or a slightly different statistic, Yule's Y.)
For example, Yule's Q is
OR - 1
Q = --------.
OR + 1
It is often more convenient to work with the log of the odds ratio than
with the odds ratio itself. The formula for the standard error of
log(OR) is very simple:
square-root(1/a + 1/b + 1/c + 1/d).
Knowing this standard error, one can easily test the significance of
log(OR) and/or construct confidence intervals. The former is
accomplished by calculating:
and referring to a table of the cumulative distribution of the standard
normal curve to determine the p-value associated with z.
Confidence limits are calculated as:
is the z value defining the
appropriate confidence limits, e.g., zL
= 1.645 or 1.96 for a two-sided 90% or 95% confidence interval,
Confidence limits for OR may be calculated as:
exp[log(OR) ± zL × slog(OR)].
Alternatives are to estimate confidence intervals by the nonparametric
bootstrap (for description, see the Raw agreement
indices page) or to construct exact confidence intervals by
considering all possible distributions of the cases in a 2×2
Once one has used log OR or OR to assess association between raters, one
may then also perform a test of marginal homogeneity, such as the McNemar test.
(Top of Page)
Pros and Cons: the Odds Ratio
- The odds ratio is very easily calculated.
- Software for its calculation is readily available, e.g., SAS
PROC FREQ and SPSS CROSSTABS.
- It is a natural, intuitively acceptable way to express magnitude
- The odds ratio is linked to other statistical methods.
- If underlying trait is continuous, the value of OR depends on the
level of each rater's threshold for a positive rating.
That is not ideal, as it implies the
basic association between raters changes if their thresholds change.
Under certain distributional assumptions (so-called "constant
association" models), this problem can be eliminated, but the
- While the odds ratio can be generalized to ordered category
data, this again introduces new assumptions and complexity. (See the Loglinear, association, and quasi-symmetry models
(Top of Page)
Extensions and alternatives
- More than two categories. In an N×N table (where N >
2), one might collapse the table into various 2×2 tables and calculate
log(OR) or OR for each. That is, for each rating category k = 1, ...,
N, one would construct the 2×2 table for the crossclassification of
Level k vs. all other levels for Raters 1 and 2, and calculate log OR or
OR. This assesses the association between raters with respect to the
Level k vs. not-Level k distinction.
This method is probably more appropriate for nominal ratings than
for ordered-category ratings. In either case, one might consider instead
using Loglinear, association, or quasi-symmetry
Multiple raters. For more than two raters, a possibility is to
calculate log(OR) or OR for all pairs of raters. One might then report,
say, the average value and range of values across all rater pairs.
Given data by two raters, the following alternatives to the odds ratio
may be considered.
(Top of Page)
In a 2×2 table, there is a close relationship between the odds ratio
and loglinear modeling. The latter can be used
to assess both association and marginal homogeneity.
Cook and Farewell (1995) presented a model that considers formal
decomposition of a 2×2 table into independent components which reflect
(1) the odds ratio and (2) marginal homogeneity.
The tetrachoric and polychoric correlations are
alternatives when one may assume that ratings are based on a latent
continuous trait which is normally distributed. With more than two
rating categories, extensions of the polychoric correlation are
available with more flexible distributional assumptions.
Association and quasi-symmetry models can be used
for N×N tables, where ratings are nominal or ordered-categorical.
These methods are related to the odds ratio.
When there are more than two raters, latent
trait and latent class
models can be used. A particular type of
latent trait model called the Rasch model is related to the odds ratio.
Either of the books by Agresti are excellent starting points.
Agresti A. Categorical data analysis. New York: Wiley, 1990.
Agresti A. An introduction to categorical data analysis. New York:
Bishop YMM, Fienberg SE, Holland PW. Discrete nultivariate analysis:
theory and practice. Cambridge, Massachusetts: MIT Press, 1975
Cook RJ, Farewell VT. Conditional inference for subject-specific
and marginal agreement: two families of agreement measures.
Canadian Journal of Statistics, 1995, 23, 333-344.
Fleiss JL. Statistical methods for rates and proportions, 2nd Ed. New
York: John Wiley, 1981.
Khamis H. Association, measures of. In Armitage P, Colton T (eds.),
The Encyclopedia of Biostatistics, Vol. 1, pp. 202-208. New York:
Somes GW, O'Brien, KF. Odds ratio estimators. In Kotz L, Johnson NL
(eds.), Encyclopedia of statistical sciences, Vol. 6, pp. 407-410. New
York: Wiley, 1988.
Sprott DA, Vogel-Sprott MD. The use of the log-odds ratio to
assess the reliability of dichotomous questionnaire data.
Applied Psychological Measurement, 1987, 11, 307-316.
Latent Class Analysis
My papers and programs
Last updated: 21 August 2006 (added counter)
John Uebersax PhD