**Basic Commands.**Use PROC FREQ with the /agree option in the tables statement. For example:proc freq data = ratings ; tables Rater1 * Rater2 /agree; run;

**Statistical significance.**To get p-values for kappa and weighted kappa, use the statement:test kappa wtkap ;

*Important!***Ordered-category data.**SAS calculates weighted kappa weights based on__unformatted values__. If your ratings are numbers, like 1, 2 and 3, this works fine.But if your ratings are character variables, like Lo, Med, and Hi, SAS will assign numerical weights based on alphabetical order, like:

Hi = 1

Lo = 2

Med = 3If the alphabetical order is different than the true order of the categories,

__weighted kappa will be incorrectly calculated__. To avoid this, either (1) recode the character values to numbers that reflect the true ordering of categories, or (2) use a format and specify the order=formatted option for Proc freq (see Example 2).**Nonsquare tables.**SAS only calculates kappa for square tables--ones where both raters use the same categories. If one rater doesn't use all the categories, but the other rater does, kappa will not be calculated.This is fixed by adding pseudo-observations, which supply the unused category(ies), but which are given a very small weight. This makes SAS process the table as square and calculate kappa. See Example 1 and Example 2 below.

**Nominal ratings.**For nominal (unordered categorical) ratings, disregard the value that SAS reports for weighted kappa (the unweighted kappa value, however is correct). As described above, SAS calculates weights based on an alphabetical ordering of categories, which has no meaning for nominal data.- The
**SAS documenation**is excellent. See: Proc Freq, Tests and Measures of Agreement.

Top of page

Back to Kappa Coefficient page

Back to Agreement Statistics main page

- To input raw rating data;
- To use pseudo-observations to force square tables so that SAS will calculate kappa statistics
- To calculate kappa, weighted kappa, their confidence ranges and standard errors, and their statistical significance

Note: this is just an example. The N is too small to produce a realistic standard error estimate, confidence range, or p-value for kappa and weighted kappa.

The code for the example is as follows:

/***** Example 1: Calculate Kappa from Raw Data *****/ * input ratings by three raters ; data raw ; infile datalines ; input rater1 rater2 rater3; datalines; 1 2 1 1 2 1 1 2 1 1 2 2 1 3 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 3 3 2 3 2 2 3 3 2 3 3 1 3 3 2 3 3 2 ; run; *------------------------------------------------------------*; * The above would produce non-square tables because Rater 2 *; * doesn't use category 1 and Rater 3 doesn't use category 3. *; * The next 3 data steps fix this. *; *------------------------------------------------------------*; * step 1: give all current observations a weight of 1 ; data raw ; set raw ; wgt = 1 ; run; * step 2: make pseudo-records ; data pseudo ; infile datalines ; wgt = .0000000001; input rater1 rater2 rater3 ; datalines; 1 1 1 2 2 2 3 3 3 ; run; * step 3: concatenate the original data and pseudo-observations ; data both ; set raw pseudo ; run; * calculate kappa and weighted kappa between all pairs of raters ; title "Example 1: Raw Data"; proc freq data = both ; weight wgt ; tables rater1 * (rater2 rater3) / norow nocol agree ; tables rater2 * rater3 / norow nocol agree ; * include significance tests ; test kappa wtkap ; run;

The following is part of the output produced by the code above:

Example 1: Raw Data Table of rater1 by rater2 rater1 rater2 Frequency| Percent | 1| 2| 3| Total ---------+--------+--------+--------+ 1 | 1E-10 | 4 | 1 | 5 | 0.00 | 21.05 | 5.26 | 26.32 ---------+--------+--------+--------+ 2 | 0 | 8 | 0 | 8 | 0.00 | 42.11 | 0.00 | 42.11 ---------+--------+--------+--------+ 3 | 0 | 1 | 5 | 6 | 0.00 | 5.26 | 26.32 | 31.58 ---------+--------+--------+--------+ Total 1E-10 13 6 19 0.00 68.42 31.58 100.00 Simple Kappa Coefficient -------------------------------- Kappa 0.4842 ASE 0.1380 95% Lower Conf Limit 0.2137 95% Upper Conf Limit 0.7547 Test of H0: Kappa = 0 ASE under H0 0.1484 Z 3.2626 One-sided Pr > Z 0.0006 Two-sided Pr > |Z| 0.0011 Weighted Kappa Coefficient -------------------------------- Weighted Kappa 0.4701 ASE 0.1457 95% Lower Conf Limit 0.1845 95% Upper Conf Limit 0.7558 Test of H0: Weighted Kappa = 0 ASE under H0 0.1426 Z 3.2971 One-sided Pr > Z 0.0005 Two-sided Pr > |Z| 0.0010

Top of page

Back to Kappa Coefficient page

Back to Agreement Statistics main page

This example shows how:

- To input rating data in the form of a crossclassification table;
- To use pseudo-frequencies to force a square table;
- To create and apply category formats so that SAS calculates weighted kappa correctly.

The SAS code to input the data and make pseudo-frequencies is as follows:

/***** Example 2: Calculate Kappa from Frequency Data *****/ * input crossclassification frequencies (including 0 frequencies) ; data rate ; length rater1 $3 rater2 $3 ; infile datalines ; input rater1 rater2 f ; datalines; Lo Lo 0 Lo Med 0 Lo Hi 0 Med Lo 5 Med Med 16 Med Hi 3 Hi Lo 8 Hi Med 12 Hi Hi 28 ; run; *----------------------------------------------*; * If all frequencies of any row or any column *; * of the crossclassification table are 0, SAS *; * will not calculate kappa. In this case, add *; * the next data step. *; *----------------------------------------------*; * change the 0 frequencies to a negligible non-zero value ; data rate ; set rate ; if f = 0 then f = .0000000001 ; run;For comparison, we first see what SAS reports if we don't apply category formats:

* see what happens by default ; title "Example 2a: Frequency Input" ; title2 "Default: Rows/Columns Ordered by Category Values"; title3 "Correct Kappa but Incorrect Weighted Kappa!"; proc freq data = rate ; weight f; tables rater1*rater2 / agree norow nocol; run;

Here is the output produced by the commands above:

Example 2a: Frequency Input Default: Rows/Columns Ordered by Category Values Correct Kappa but Incorrect Weighted Kappa! rater1 rater2 Frequency| Percent |Hi |Lo |Med | Total ---------+--------+--------+--------+ Hi | 28 | 8 | 12 | 48 ---------+--------+--------+--------+ Lo | 1E-10 | 1E-10 | 1E-10 | 3E-10 ---------+--------+--------+--------+ Med | 3 | 5 | 16 | 24 ---------+--------+--------+--------+ Total 31 13 28 72 Kappa Statistics Statistic Value ASE 95% Confidence Limits ------------------------------------------------------------ Simple Kappa 0.3333 0.0814 0.1738 0.4929 Weighted Kappa 0.3944 0.0917 0.2146 0.5741

Now let's do things the right way. First we create a format that assigns our categories to numbers. Then we refer to the format in proc freq:

* define category intervals using a format ; proc format ; value $rate 'Lo' = 1 'Med' = 2 'Hi' = 3 ; run; * calculate kappa and unweighted kappa using formatted values; title "Example 2a: Frequency Input" ; title2 "Order Rows/Columns by Formatted Values" ; proc freq data = rate order=formatted ; format rater1 rater2 $rate. ; weight f; tables rater1*rater2 / agree norow nocol; run;Here is the output produced by the above. Note that the value of kappa is the same, but the value of weighted kappa is now correct:

Example 2b: Frequency Input Order Rows/Columns by Formatted Values Table of rater1 by rater2 rater1 rater2 Frequency| Percent |1 |2 |3 | Total ---------+--------+--------+--------+ 1 | 1E-10 | 1E-10 | 1E-10 | 3E-10 ---------+--------+--------+--------+ 2 | 5 | 16 | 3 | 24 ---------+--------+--------+--------+ 3 | 8 | 12 | 28 | 48 ---------+--------+--------+--------+ Total 13 28 31 72 Kappa Statistics Statistic Value ASE 95% Confidence Limits ------------------------------------------------------------ Simple Kappa 0.3333 0.0814 0.1738 0.4929 Weighted Kappa 0.2895 0.0756 0.1414 0.4376

Top of page

Back to Kappa Coefficient page

Back to Agreement Statistics main page

- The magree.sas macro calculates multi-rater kappa.
- Another macro, by Liu and Hays, handles nonsquare or irregular tables and permits user-supplied weights for kappa between two raters. Their macro is described here .
- Another way to fix the nonsquare table bug is described in this short paper.

Top of page

Back to Kappa Coefficient page

Back to Agreement Statistics main page

(c) 2000-2009 John Uebersax PhD email

*
Last revised: 20 July 2002
*