README file for program Homo, test version 0.2, 22 October 2002





The program and this README file was written by Harald Gring. My current
email address is hgoring@darwin.sfbr.org.





Program Homo performs a test of heterogeneity using an admixture model
[Smith CAB (1961) Homogeneity test for linkage data. Proc Sec Int Congr
Hum Genet 1:212-213]. The program tests for linkage assuming homogeneity,
for heterogeneity given linkage (which makes sense only if linkage
assuming homogeneity has been demostrated), and of joint linkage and
heterogeneity. The program takes its input from an input file and writes
its output to an output file. The file names are fixed.

Overall, the program is quite similar to Jurg Ott's Pascal program HOMOG
[Ott J (1983) Linkage analysis and family classification under
heterogeneity. Ann Hum Genet 47:311-320], with differences in input and
output formats. This program is not intended as a replacement for HOMOG.
Rather, it was convenient to have my own program, written in ansi-C, with
different input and output formats.





Table of contents:
 1.	Program changes
 2.	Legal notice
 3.	Obtaining the program
 4.	Installation
 5.	Compilation, including programming constants
 6.	Invoking the program
 7.	Overview of program
 8.	Description of input file
 9.	Example input file
10.	Description of output file
11.	Example output file
12.	Correspondence
13.	References





1. Program changes

Changes with version 0.1: 

This is the first test version of the program. There are thus no changes.


Changes with version 0.2: 

The program no longer aborts when computing the p-value of a negative
chi-squared statistic, instead treating the statistic as being equal to 0.
A negative value of a chi-squared statistic may be obtained due to
numerical problems (such as "rounding") or due to the existence of
nuisance parameters (see below).

In addition, miniscule changes to the output format were made, and the
README file was edited slightly.





2. Legal notice

The program is free of charge. 

You may modify the program for use by yourself or in your own lab.

You may distribute the unmodified program. If you do, please include this
file. However, please do not distribute a modified version of the program
without my permission. If you do so nonetheless, please include this file
and in addition a description of your changes.

I do not take any responsibility for the accuracy of the results nor for
the use to which they are put.

Since this program is rather trivial, there is no need to cite its use. If
you feel like it nonetheless, simply state the program name, version, my
name and email address.





3. Obtaining the program

Currently, the only way to obtain the program is by contacting me. My
current email address is hgoring@darwin.sfbr.org. At some point, I might
distribute this program via the web.





4. Installation

Assuming that you obtained file homo.v0.2.tar.gz:

Uncompress file homo.v0.2.tar.gz with the following or equivalent command:
	gzip -drv homo.v0.2.tar.gz
This creates file homo.v0.2.tar.

Un-archive file homo.tar with the following or equivalent command:
	tar -xvf homo.v0.2.tar
This creates a sub-directory homo/v0.2, which contains the source code,
a Makefile, a README file (this file) and example input and output files
shown below.

Initial compilation is necessary and described below.

After compilation, if you want to run the program from directories other
than where it is installed, you might want to put the executable file in
your path.




5. Compilation, including programming constants

The program is written in ansi-C. Compilation should therefore be no
problem. A Makefile is provided for this purpose, such that the command
"make" should do the trick and produce the executable file "homo". It may
be necessary, however, to modify the Makefile according to your situation.


There are a number of constants, which may be changed if necessary, which
I assume will rarely, if ever, be required. These constants are contained
in the header file "homo.h". The constants of interest are:

#define MAXCHLINE      2000     /*max. + 1 no. of characters in a line*/
#define MAXCHWORD       100     /*max. + 1 no. of characters in a word*/
#define ALPHAINCREMENT    0.001 /*stepsize in homogeneity parameter ...*/

MAXCHLINE needs to be increased if any line in the comma-delimited input
file contains more than MAXCHLINE characters. This does not matter if the
input file is space-delimited (the program supports both options; see
below).

MAXCHWORD needs to be increased if the identifier of any pedigree contains
more than MAXCHWORD characters.

ALPHAINCREMENT needs to be decreased if you want to evaluate the
likelihood for a finer grid of values for the homogeneity parameter. This
value must be greater than 0.





6. Invoking the program

If the program is in your path, invoke it with the following
commands:
	
	homo
	homo -c

The first command assumes a space-delimited input file, while the second
command assumes a comma-delimited input file (see below).





7. General program description:

Program homo performs a heterogeneity test using an admixture model [Smith
CAB (1961) Homogeneity test for linkage data. Proc Sec Int Congr Hum Genet
1:212-213].

The program performs tests of linkage assuming homogeneity, of
heterogeneity given linkage, and of joint linkage and heterogeneity. The
test statistic of linkage under homogeneity, -2 ln [L(no linkage) /
L(linkage, homogeneity)], which is equal to 2 ln(10)-times the "standard"
lod score of linkage), is assumed to be asymptotically distributed as 0.5
(0) + 0.5 chi(1). The test statistic of heterogeneity given linkage, -2 ln
[L(linkage, homogeneity) / L(linkage, heterogeneity)], is assumed to be
asymptotically distributed as 0.5 (0) + 0.5 chi(1). The joint test
statistic of linkage and heterogeneity, -2 ln [L(no linkage) / L(linkage,
heterogeneity)], is tentatively assumed to be asymptotically distributed
as 0.25 (0) + 0.5 chi(1) + 0.25 chi(2). This may not be correct, because
only a single parameter (the linkage parameter or the homogeneity
parameter) exists under the null hypothesis of no linkage, but not both
parameters. See below for additional reasons why this may not hold.

Note that the test of heterogeneity given linkage only makes sense if
linkage (under homogeneite) has been demonstrated, i.e. when the test of
linkage under homogeneity is significant. Similarly, the probabilities
that a given pedigree is of the linked type only make sense if both
linkage and heterogeneity have been demonstrated.

Note that the test of heterogeneity given linkage is not subject to
multiple testing, at least when this test is used at a single position
after linkage has been demonstrated, e.g. at the position of a
statistically significant lod score peak. Standard levels of significance
such as 0.05 may thus be appropriate, in contrast to the test of linkage
under homogeneity and the joint test of linkage and heterogeneity.

The program takes its input from an input file (file "homo.in", which is
either space- or comma-delimted) and writes its output to an output file
(file "homo.out"). If the output file already exists, it will be
overwritten without warning.





8. Description of input file:

The input file is named homo.in and is assumed to be error-free. If there
are errors, the program may crash or give incorrect results. The file
contains (multiple) space-delimited or (single) comma-delimited data. The
file format is as follows:

Each line contains the data for a single pedigree. No empty lines are
allowed anywhere in the file, including at the end (because the number of 
lines in the file is used to determine the number of pedigrees).

The 1st column contains the pedigree identifier (treated as a character
array).

The 2nd column contains the ln likelihood in the absence of linkage or a
locus effect. (In most forms of linkage analysis, this model is simply
termed the null model of absence of linkage. In variance-components-based
linkage analysis, such as in the computer package SOLAR [Almasy L,
Blangero B (2000) Multipoint quantitative-trait linkage analysis in
general pedigrees. Am J Hum Genet 62:1198-1211], this is sometimes termed
the polygenic model.)

The remaining columns contain the ln likelihood in the presence of linkage
or a locus effect for different considered values of the linkage parameter
of interest, such as the recombination fraction (e.g. in single marker
analysis under a penetrance model), map position (e.g. in multiple marker
analysis under a penetrance model or in various "model-free" methods such
as affected sib pair analysis) or locus effect (such as the additive
heritability attributable to a quantitative trait locus in variance
components-based linkage analysis). The actual values which were
considered are of no concern to the program, except for output purposes.
The 3rd column is unusual in that it contains the ln likelihood for the
value of the parameter of interest equal to that under the null
hypothesis. Generally, the ln likelihoods in the 2nd and 3rd column should
be the same. However, if there are additional parameters such as nuisance
parameters over which the likelihood is maximized independently under the
null and alternative hypothesis, then this may not be the case. The reason
is that the estimates for these additional nuisance parameters must be
constrained to be equal for all considered values of the parameter of
interest under the alternative hypothesis. If this is not the case, then
the distribution of the test statistic can be anti-conservative (as there
are additional degrees of freedom). If this is the case, the theoretical
distribution still does not hold, but at least the test statistic is
distributed conservatively, with the degree of conservativeness depending
on the situation. This complication only comes into play when using a
program such as this or Jurg Ott's HOMOG on the computed ln likelihoods
under homogeneity, which is clearly inferior to joint maximization of the
likelihoods over all parameters including the homogeneity parameter.
However, to enable such stand-alone heterogeneity analysis on data
containing nuisance parameters at all, this program requires both columns
2 and 3.

The first line is a header line containing the column descriptions. The
descriptions of the first 2 columns are not read by the program. The
descriptions of the remaining columns are real-valued numbers
corresponding to the considered values of the parameter of interest under
the alternative hypothesis. The actual values are not of interest to the
program except for output purposes (but the value in column 3 should be
the value corresponding to the null hypothesis).





9. Example input file:

Here is an example input file (file "homo.in.example"):

================================= top of file=================================
pedigree H0(0.0)    0.0        0.25       0.5        0.547824
1	 -14.571021 -14.571021 -13.976540 -13.947142 -14.000704
2        -36.053713 -36.053713 -33.507101 -32.278364 -32.089103
3        -42.543684 -42.543684 -42.524543 -42.949210 -43.048406
4        -22.371781 -22.371781 -23.711081 -25.843235 -26.382603
5        -14.784605 -14.784605 -14.243736 -15.026048 -15.301794
6        -16.589288 -16.589288 -15.366200 -14.552932 -14.425285
7        -17.053364 -17.053364 -16.106695 -15.716775 -15.708037
8        -12.334806 -12.334806 -12.652468 -14.017889 -14.410777
9         -4.040945  -4.040945  -4.334180  -5.033988  -5.216918
10       -17.274505 -17.274505 -17.686661 -18.658107 -18.923429
ped11    -20.655283 -20.655283 -18.970379 -18.255366 -18.165237
ped12     -6.396721  -6.396721  -5.641636  -5.516511  -5.547768
ped13    -14.046281 -14.046281 -14.955922 -16.200899 -16.502320
ped14    -31.876677 -31.876677 -34.292162 -37.092688 -37.706655
ped15    -21.004423 -21.004423 -20.779725 -20.600293 -20.575726
ped16    -11.438942 -11.438942 -10.596445  -9.926502  -9.820651
ped17    -16.119489 -16.119489 -15.816796 -15.763794 -15.770420
ped18     -8.545013  -8.545013  -8.201118  -8.266424  -8.315961
ped19    -15.906046 -15.906046 -16.058611 -16.553366 -16.701089
ped20    -16.704263 -16.704263 -15.889676 -15.244580 -15.113708
================================ bottom of file===============================

In a real analysis, it would be preferred to have the ln likelihoods for
many more values of the linkage parameter of interest.





10. Description of output file:

The program writes its output to file homo.out. If this file already
exists, its contents will be overwritten without warning. The output
should be self-explanatory.





11. Example output file:

Here is an example output file (file "homo.out.example"):

================================= top of file=================================
================================================================================
program homo, test version 0.2, 22 October 2002
by Harald Gring
See README file for documentation.
For bug reports, comments or questions, send email to hgoring@darwin.sfbr.org.
================================================================================
Tue Oct 22 16:49:43 2002


description of hypotheses:

abbr.        description
----- -------------------------
  H0  no linkage
  H1     linkage, homogeneity
  H2     linkage, heterogeneity


description of tests:

  abbr.           description            lod score     chi^2 statistic   theoretical asymptotic distribution
--------- --------------------------- --------------- ----------------- ------------------------------------
H0 vs. H1   linkage under homogeneity lg[L(H1)/L(H0)] -2ln[L(H0)/L(H1)]  .5 (0) + .5 chi^2(1)
H1 vs. H2 heterogeneity given linkage lg[L(H2)/L(H1)] -2ln[L(H1)/L(H2)]  .5 (0) + .5 chi^2(1)
H0 vs. H2   linkage and heterogeneity lg[L(H2)/L(H0)] -2ln[L(H0)/L(H2)] .25 (0) + .5 chi^2(1) + .25 chi^2(2) (?)


test results:

   test   lod score chi^2 statistic    p-value
--------- --------- --------------- ------------
H0 vs. H1  2.171114        9.998350     0.000787
H1 vs. H2  0.844853        3.890692     0.024285 (TEST MAKES NO SENSE AS H0 VS. H1 IS NOT SIGNIFICANT!)
H0 vs. H2  3.015967       13.889042     0.000339


maximum likelihoods and maximum likelihood estimates of parameters:

hypothesis ln likelihood linkage par.  homog. par.
---------- ------------- ------------ ------------
    H0       -360.310850   (0.000000)  (undefined)
    H1       -355.311675    0.250000   (1.000000)
    H2       -353.366329    0.547824    0.502000


pedigree-specific results:

                      ln likelihood                       lod score           prob. that
         -------------------------------------- -----------------------------   pedigree
pedigree      H0           H1           H2      H0 vs. H1 H1 vs. H2 H0 vs. H2  is linked
-------- ------------ ------------ ------------ --------- --------- --------- ----------
       1   -14.571021   -13.976540   -14.244634  0.258180 -0.116432  0.141748    0.641
       2   -36.053713   -33.507101   -32.759609  1.105980  0.324632  1.430611    0.982
       3   -42.543684   -42.524543   -42.765523  0.008313 -0.104656 -0.096344    0.378
       4   -22.371781   -23.711081   -23.050837 -0.581651  0.286740 -0.294910    0.018
       5   -14.784605   -14.243736   -15.011143  0.234896 -0.333280 -0.098384    0.375
       6   -16.589288   -15.366200   -15.006529  0.531180  0.156203  0.687384    0.898
       7   -17.053364   -16.106695   -16.167368  0.411133 -0.026350  0.384783    0.795
       8   -12.334806   -12.652468   -12.912897 -0.137959 -0.113103 -0.251062    0.112
       9    -4.040945    -4.334180    -4.467313 -0.127350 -0.057819 -0.185169    0.237
      10   -17.274505   -17.686661   -17.794518 -0.178997 -0.046842 -0.225839    0.162
   ped11   -20.655283   -18.970379   -18.775354  0.731745  0.084698  0.816443    0.924
   ped12    -6.396721    -5.641636    -5.883135  0.327929 -0.104882  0.223048    0.702
   ped13   -14.046281   -14.955922   -14.660509 -0.395052  0.128296 -0.266756    0.080
   ped14   -31.876677   -34.292162   -32.570875 -1.049032  0.747546 -0.301486    0.003
   ped15   -21.004423   -20.779725   -20.766432  0.097585  0.005773  0.103358    0.607
   ped16   -11.438942   -10.596445   -10.330274  0.365892  0.115597  0.481488    0.836
   ped17   -16.119489   -15.816796   -15.929109  0.131458 -0.048777  0.082681    0.588
   ped18    -8.545013    -8.201118    -8.423487  0.149352 -0.096574  0.052778    0.559
   ped19   -15.906046   -16.058611   -16.228066 -0.066258 -0.073593 -0.139851    0.313
   ped20   -16.704263   -15.889676   -15.618720  0.353771  0.117675  0.471446    0.832
   total  -360.310850  -355.311675  -353.366329  2.171114  0.844853  3.015967
================================ bottom of file===============================

Note that is in this case the test of heterogeneity given linkage does not
make sense, because the test of linkage under homogeneity is not
significant. The program flags this complication by stating this. The
program nonetheless gives the probabilities that any pedigree is of the
linked type without any warning, even though these probabilities also do
not make sense here.





12. Correspondence

If you detect a bug in the program please send me e-mail at 

	hgoring@darwin.sfbr.org

I would also appreciate suggestions for or criticism of the program. 





13. References

Here are a few references for the admixture test of linkage heterogeneity:

Smith CAB (1961) Homogeneity test for linkage data. Proc Sec Int Congr Hum
Genet 1:212-213]

Ott J (1983) Linkage analysis and family classification under
heterogeneity. Ann Hum Genet 47:311-320

Ott J (1985) Analysis of human genetic linkage. The Johns Hopkins
University Press (which is the reference for the various HOMOG programs by
J Ott)
