The advanced version of POLYCORR has several features that the basic version of POLYCORR does not include. The basic version, with only a few simple commands, is mainly intended as a teaching tool. Added features of the advanced version include: ability to have different numbers of levels for the row variable and column variable, ability to combine data cells, and options to control the accuracy of estimation. Appendix A gives a complete list of the advanced features.
Data are supplied as a table of observed frequencies. Output includes the polychoric correlation and its standard error, estimated thresholds and, possibly, their standard errors, and model fit statistics.
The user can choose joint maximum likelihood (ML) or twostep estimation
of parameters (Drasgow, 1988).
XPC
The program can also be run from the Windows File Manager, or from the Windows "Run program" prompt. If you are using a prePentium machine without a math coprocessor, this version of XPC will not run; contact the author to obtain a suitable version.
The program will first prompt for input and output filenames. In response to each of these prompts, supply a valid DOS file name, including, if appropriate, a path, for example:
c:\datasets\laruche.xpc
Simply pressing the Return key will cause the default file name to be used. The default input and output filenames are input.txt and output.txt, respectively.
Numbers will then scroll past as the program runs. These are the likelihoodratio chisquared statistic calculated at each iteration. These numbers should generally decrease.
With twostep estimation, fewer than 50 iterations may be needed; with
joint ML estimation, 1000 or more may be required for a large table. If
the program doesn't converge in the number of allotted iterations, enter
a "1" in Command Line 3 of the input file and rerun the program.
POLYCORR will the resume estimation where it left off.
The 14 command lines of the input file are as follows:
Line 1. A run title of up to 80 characters.
Line 2. Maximum number of iterations. One can usually leave this set at 5000. (It is unlikely that that many iterations will be needed.)
Line 3. Use previous start values? The default value of 0 causes POLYCORR to begin estimation with default start values that it calculates. A value of 1 means the user will instead supply start values (see Usersupplied Start Values).
Line 4. Levels for Item/Rater 1. This is the number of ordered categories associated with the first Item or Rater (the number of rows of the input table). The current maximum is 18.
Line 5. Levels for Item/Rater 2. This is the number of ordered categories associated with the second Item or Rater (the number of columns of the input table). The current maximum is 18.
Line 6. Estimation method. The default value of 0 means joint ML estimation will be used. A value of 1 specifies twostep estimation.
Line 7. Criterion. The default value of 0 means POLYCORR will minimize the likelihoodratio chisquared (Gsquared) statistic. Gsquared is equal to 2 log L plus a datadependent constant. Therefore minimizing Gsquared is equivalent to maximizing log L; that is, it produces maximum likelihood (ML) estimates. A value of 1 specifies minimization of the Pearson chisquared (Xsquared) statistic. Estimated parameter standard errors are not calculated with minimumXsquared estimation.
Line 8. This option is reserved for future use. Specify a value of 0 or leave this line blank.
Line 9. Suppress standard errors. The default value of 0 means standard errors will be estimated. A value of 1 suppresses standard error calculation.
The following lines are more technical. Many users can leave these set to the default values of 0. 
Line 10. Algorithm used to calculate normal cdf. The default value of 0 means ALNORM (Applied Statistics algorithm AS 66) is used to calculate values for the normal cumulative distribution function (cdf). This should be adequate for most applications. If this value is 1 POLYCORR will use a more accurate cdf routine (NORMP). If the value is 2, an alternative accurate routine (NPROB) is used.
Line 11. Latent trait range. This defines the range of the latent trait over which integration is performed in the calculation of expected frequencies. The default value is the range (relative to a standard normal curve) of /+ 5. To extend the range, supply a (positive) value of up to 10.0. The latent trait range will be set to minus/plus this value; for example, if 10.0 is specified, the range will be from 10 to 10. The format is F4.0. (If you include a decimal place, it will override the F4.0 format, but the value must be in columns 14).
Line 12. Number of quadrature points for integration. Integration is performed by dividing the latent trait into a finite number of equallyspaced points. A value of 0 in this field results in the default number of 51 points being used. It is recommended that this value not be changed unless there is a reason. For more accuracy, a larger number of up to 81 can be specified. For technical reasons it is probably better to specify an odd number. A number less than 51 will increase program speed, but, this should probably not be done without a good reason (in any case, the number should never be less than 21).
Line 13. Output format. This controls the number of decimal places for printing of expected frequencies, as follows:
Value  Number of decimal places printed 
0 (default)  2 
1 to 7  1 to 7, respectively 
8  18 
9 or more  0 
Line 14. Number of metacells. A metacell is the combination of two or more cells in the original data table. When cells are combined, their observed and expected frequencies are pooled for purposes of parameter estimation. Up to 20 metacells can be defined. For Command Lines 214 (except Line 11) values are supplied in I4 formatthat is, the integer value must be (a) in Columns 14 and (b) be rightjustified. Leaving Columns 14 blank is the same as supplying a value of 0.
Comments can be supplied on Command Lines 214 anywhere after Column 4. It is recommended that comments be used to identify the option associated with each line.
The file input.txt supplied with POLYCORR shows proper construction of an input file. 
The elements of the metacell pattern matrix correspond oneforone with the cells of the observed frequency table. Supply a "0" in the pattern matrix to show that a cell is not to be combined. Supply a positive integer from 1 to 20 to indicate metacell membership; all data cells with the same nonzero pattern value comprise the corresponding metacell. For example, all cells with a "1" in the pattern matrix define Metacell 1, all cells with a "2" define Metacell 2, etc.
The following example metacell pattern matrix:
0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 2 2 0 0 0specifies that, in cells (4, 1), (5, 1) and (5, 2) of the data table are to be combined, and cells (1, 4), (1, 5) and (2, 5) of the data table are to be combined for purposes of estimation.
Format for the pattern matrix is freefield. One or more blank lines can separate the observed frequency table and the pattern matrix.
The use of metacells is experimental at this point. The idea is to improve parameter estimation reducing data sparseness. Definitely do not use metacells to combine entire rows or columns of the data tabledoing so will make the solution unidentified; instead, collapse the rows or columns before running POLYCORR. Nonidentifiability can possibly result in other situations as well. There probably should be at least one cell in each row and each column that is not combined with other cells. 
Chisquared df are calculated as (R × C)  1  k, where:
If metacells are defined, df are adjusted (reduced) accordingly.
If the Gsquared and/or Xsquared statistics show significant lack of model fit (e.g., p < .10), the user may consider the following options:

This section also reports whether the program converged or not.
Next it reports a test of a zero polychoric correlation. This is simply a chisquared test of statistical independence for the data, to which the polychoric model reduces when rho = 0. A nonsignificant result means that a model that assumes a zero polychoric correlation fits the data; this can be interpreted as evidence that the null hypothesis H0: rho = 0 is tenable. At present, POLYCORR does not consider metacells when performing this test.
Next the Pearson correlation between the two manifest variables is reported (i.e., the correlation obtained treating the variables as interval data).
Following this the threshold estimates are reported. Standard errors of threshold estimates are not calculated if twostep estimation of the polychoric correlation is used.
If metacells have been defined, metacell memberships are shown. The observed and expected metacell frequencies are also printed.
First derivatives are printed twice, once to 4 decimal places, and once
in scientific notation.
Let X1 and X2 denote the observed levels of the row and column variables, respectively, for a given case. Let Y1 and Y2 denote values of the prediscretized continuous variables associated with X1 and X2.
The measurement model is:
Y1 = bT + e1, Y2 = bT + e2. 
In the above equations, T is a latent traitanalogous to a common factorwhich Y1 and Y2 have in common and which accounts for their correlation; b is a regression coefficient, and e1 and e2 represent random errors.
The standard model assumes that the latent trait T is normally distributed. As scaling is arbitrary, we specify that T ~ N(0, 1). Error is similarly assumed to be normally distributed (and independent both between raters and across cases). A consequence of these assumptions is that Y1 and Y2 must also be normally distributed. To fix the scale, we specify that var(Y1) = var(Y2) = 1. It follows that b = the correlation of both Y1 and Y2 with the latent trait, and that b^{2} is the correlation of Y1 and Y2 (it is also the polychoric correlation of X1 and X2the correlation of the two variables we would observe if both variables were measured continuously.
The assumptions of the polychoric correlation coefficient may be summarized as follows:
Assumption 1 is true essentially true by definition. The existence of a latent trait is implied by the existence of a nonzero polychoric correlation and vice versa. Just as with a common factor in factor analysis, the latent trait is "what the variable have in common." It may correspond to a moreorless real but unobserved variablesuch as intelligence or disease severity. Or it may simply be a shared component of variation.
Assumptions 2, 3 and 4 can be alternatively expressed as the assumption that Y1 and Y2 follow a bivariate normal distribution.
Assumption 5 is essentially true by definition, since any consistent association between the two variables is accounted for by the latent trait. Assumption 6, a standard assumption for statistical methods, is usually considered met with random sampling.
Assumptions 2, 3 and 4, then, are the main assumptions tested with model fit statistics. Assumption 2 can be relaxed by considering other distributional forms for the latent trait, or modeling a nonparametric latent trait distribution. Methods for relaxing Assumption 4 are described by Hutchinson (2000); a version of POLYCORR that permits relaxation of this assumption is currently being tested (users may contact the author to obtain a preliminary version.)
Concerning calculations, expected frequencies are calculated by numerical integration over the range of the latent trait, T. The method is described in Uebersax (1993). Bivariate integration is not necessary. At each level of T, the product of two normal cumulative distribution function values (calculated via an accurate polynomial approximation), one associated with Y1 and one associated with Y2, is calculated.
Accuracy depends on the following:
Based both on experience and reference to earlier literature (e.g., Bock and Aitkin, 1981) a latent trait range of /+ 5 (relative to a standard normal curve) is taken as the default.
POLYCORR uses the most elementary integration methodliterally "integration by rectangles." Greater efficiency could be obtained by using Simpson's rule or GaussHermite quadrature. However, with 51 quadrature points over the range +/ 5, this simpler method is sufficient. (Doubling the number of quadrature points, for example, has little effect on results).
Parameter estimates are obtained by iteratively adjusting parameter values to find those that best fit the observed data by the criterion of maximum likelihood (or, is specified, minimumXsquared). The iterative adjustments are handled by STEPIT, a general algorithm for multivariate minimization/maximization (Chandler, 1969).
With joint ML estimation, all parameters (the polychoric correlation and thresholds) are estimated by this means. With twostep estimation, thresholds are estimated directly from cumulative marginal proportions, and only rho is estimated iteratively.
Standard errors are calculated by inverting the observed information matrix (the matrix of second derivatives of model parameters relative to log L). The observed information matrix is calculated by finite differences. For twostep estimation, when estimating the standard error of rho, the thresholds are viewed as fixed parameters. This appears consistent with Drasgow (1988) and others. It is debatable, however, as the thresholds are still subject to sampling variability even if calculated from the marginals. At present, the question of standard errors for twostep estimation is left open.
POLYCORR has been benchmarked against: PRELIS Version 1.0 (Joreskog &
Sorbom, 1993) for twostep estimation; against SAS PROC FREQ PLCORR and
the calculations of Tallis (1962) and Drasgow (1988) for joint ML
estimation; and against Applied Statistics algorithm AS 116 (Brown,
1977) for the tetrachoric correlation. In each case POLYCORR appears
at least as accurate as the benchmark source.
For joint ML estimation, k = R + C  1, where R is the number of row levels and C is the number of column levels. The first line gives the start value for rho. Next, on successive lines, are the start values for thresholds 2, 3, ..., R for the first item/rater (row variable), followed by start values for thresholds 2, 3, ..., C for the second item/rater (column variable). With respect to each variable, it is important that threshold start values be in ascending orderthat is, within the row variable and within the column variable, highernumbered thresholds must be greater than lowernumbered thresholds. In general, one use successive integers, e.g., 2., 1., 0., 1., 2. as start values for each rater's thresholds.
For twostep estimation, k = 1. There is only one line, containing the start value for rho.
Values must include a decimal place and be one per line, with no blank
lines. Other than that the format is unimportant. To see an example,
run POLYCORR and examine the START.XPC file is creates.
This will not likely affect the user. The default start value for rho is the Pearson r calculated for the data. If the Pearson r is positive, a positive rho will be estimated; if the Pearson r is negative, a negative rho is estimated.
It is unlikely that rho would have a sign opposite of the Pearson r. Still, should this be the case, the user has an option. Suppose that the Pearson r is positive, and that POLYCORR attempts to estimate a positive rho. If the true rho is negative, one of two things will happen: (1) rho will be reported as 0; or (2) the program will terminate with an error message.
In either case the user should rerun the program using usersupplied start values. A negative value should be specified for the rho start value. This will cause POLYCORR to estimate a negativevalued rho.
Similarly, a usersupplied positivevalued rho will cause POLYCORR to
estimate a positivevalued rho.
With POLYCORR, any computational problem that might occur is usually obvious. Signs that something is wrong include a negative Gsquared value or a program crash. If these occur, first try twostep estimation to see if that eliminates the problem. If that doesn't work, please send me email (including the input file) and I will try to correct the problem.
For added assurance that POLYCORR has worked correctly, examine the
first derivatives in the printed output. If these are all nearzero, it
is likely that the estimates are correct.
The STEPIT subroutine writes a small amount of output to the file STEPIT.OUT. Most users need not be concerned with this file. The most useful information is potentially the matrix of second derivatives of the objective function (in this case Gsquared) relative to estimated model parameters, which is produced if standard errors are estimated.
POLYCORR is copyrighted (all rights reserved). It may be downloaded from this site, and the user may retain multiple copies of the downloaded version for his or her personal use. But it may not be transmitted to other users. It may not be translated to other programming languages without the express permission and consent of the author. You may not decompile, disassemble, modify, decrypt, or otherwise exploit this program. 
The POLYCORR program can be downloaded at:
http://wwww.johnuebersax.com/bin/xpc.zip.
This user guide is available at:
http://www.johnuebersax.com/stat/xpc.htm.
I hope you find the POLYCORR program helpful. Please notify me if the program does not work correctly, or to suggest additions or changes that might make it more useful.
John Uebersax PhD
This program is distributed asis. It has not undergone extensive testing. The author does not guarantee accuracy and assumes no responsibility for unintended consequences of its use. 
Brown MB. Algorithm AS 116: the tetrachoric correlation and its standard error. Applied Statistics, 1977, 26, 343351.
Chandler JP. STEPITFinds local minima of a smooth function of several parameters. Computer program abstract. Behavioral Science, 1969, 14, 8182.
Hutchinson TP. Kappa muddles together two sources of disagreement: tetrachoric correlation is better. Research in Nursing and Health, 1993, 16, 313315.
Hutchinson TP. Assessing the health of plants: Simulation helps us understand observer disagreements. Environmetrics, 2000, 11, 305314.
Olsson U. Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 1979, 44, 443460.
Uebersax JS. Statistical modeling of expert ratings on medical treatment appropriateness. Journal of the American Statistical Association, 1993, 88, 421427.
Uebersax JS. The tetrachoric and polychoric correlation coefficients. (http://www.johnuebersax.com/stat/tetra.htm). July, 2000.
xpc.htm  User guide for the POLYCORR program (advanced version); HTML format. 
xpc.exe  Executable version of POLYCORR (advanced version) 
input.txt  Sample input file 
output.txt  Sample output file 
BENCHMARK\  Folder containing benchmark input and output files 
Uebersax JS. User Guide for POLYCORR 1.1. Statistical Methods for Rater Agreement web site. 2007. Available at: http://johnuebersax.com/stat/xpc.htm . Accessed mmmm dd, yyyy.
Last updated: 5 November 2010 (corrected links; xpc.exe now compatible with 64bit Windows 7)