[ Home | Download | News | Bugs | Sample graphics | Sample data | Tutorial | Book | Manual | Citation ]
If you have questions, suggestions, corrections, etc., please email Karl Broman (broman at wisc.edu).
First, take a look at the help file for the
      read.cross function.  Next, look at some of the sample data files.
      
If you are still having trouble, send an email to Karl Broman (broman at wisc.edu), attaching a copy of your data. He's had little trouble, up to now, providing assistance with such problems, and will keep your data confidential.
Code hemizygous male genotypes as if they were homozygous.
      For example, for a backcross, you could code females as A and H and males as
      A and B.  Or, you could code females as AA and AB and males as
      AA and BB, in which case this needs to be indicated through the
      genotypes argument in read.cross.
      
Be sure to include a "phenotype" column indicating the sex of the individuals. Also, in an intercross, include another "phenotype" column that indicates the cross direction; this should be named "pgm" (for "paternal grandmother").
R/qtl is currently in maintenance mode. We will continue to fix problems, but we don't expect to add any new features. We are focusing our development efforts on R/qtl2.
We apologize that some warnings and error messages are not very easy to understand. For the same reason, they are seldom simple to diagnose without more information.
Send an email to Karl Broman (broman at wisc.edu), including the code that led to the problem, and ideally also the primary data. It will also be useful to include information on your operating system and the versions of R and R/qtl that you are using. Your versions of R and R/qtl may be determined by typing the following.
version
      qtlversion()
  
In Windows, by default you get 1 Gb memory (or the amount of
      RAM you have on your computer, if that is less that 1 Gb).  If
      you have 2 Gb RAM, you need to use the command-line flag
      --max-mem-size to have access to the additional
      memory.
      
Right-click on the R icon that you use to start R and select
      "Properties".  Then select the tab "Shortcut" and modify the "Target"
      to include something like --max-mem-size=2G.
      
Alternatively, you can change the memory limit within R using the
      memory.limit function, giving a new limit in Mb.  (For
      example, type memory.limit(2048) to change the memory
      limit to 2 Gb.)
      
See also the R for Windows
      FAQ and, within R, type ?Memory and
      ?memory.size.
  
Of course, one is limited by the memory available on one's computer, and so there are not many options.
First, clean up your workspace, removing objects that aren't
      important to you.  You can save objects to disk with the
      save command.
      
The multiple imputation method, as implemented, uses a
      particularly large amount of memory.  Consider using a small number
      of imputations (n.draws) or a coarser grid
      (step) in sim.geno,
      or focusing on a subset of the chromosomes.
  
We recommend purchasing a computer with as much memory (RAM) as possible: preferably at least 2 Gb. And of course, the faster the processor, the better.
R currently can deal with just one processor at a time. However, if you have a computer with multiple processors, you can speed up permutation tests and simulations by spawning multiple instances of R at once. We routinely make use of the multiple processors on a linux cluster for more rapid permutation tests.
If a permutation test is to be split across multiple processors,
      it is important to ensure that the random number seeds are set to be
      different for the different jobs, using the function
      set.seed.  Otherwise, the multiple jobs may give
      precisely the same results.
      
In version 1.12, we added the ability to have scanone and
      scantwo permutations run in parallel, if the snow
      package is installed.  The argument n.cluster
      indicates the number of parallel nodes to use.
  
Within R, use the functions getwd to determine
      the current working directory,  setwd to change the
      current working directory, and dir to list the
      files in the current working directory.
      
To change R's default working directory in Windows, create a shortcut to the R GUI (there may already be one on your desktop) and then do the following:
To change R's default working directory on a Mac, start R and then select (on the menu bar) R -> Preferences -> Startup, and then change the "Initial working directory".
It is possible, but it is not yet documented. And we can't handle heterozygote genotypes, so those must be treated as missing.
Read in your data as if it were a backcross, and then type
      one of the following, according to whether your RIL were
      generated by selfing or sibling mating (I assume that your data
      is in the object myx.)
      
myx <- convert2riself(myx)
      myx <- convert2risib(myx)
      
The data are treated essentially like a backcross, but the map is expanded before calculating QTL genotype probabilities and so forth. Note that we currently can deal only with strain averages as phenotypes.
Generally, no.  R/qtl does include facilities for analysis of
      a phase-known four-way cross, generally derived from a cross
      between four inbred strains, with all progeny from a cross of
      the form (A × B) × (C × D), with females
      listed first.  But you must first infer phase, and R/qtl offers
      no facilities for this. But see the help file for the
      read.cross function for details about the coding of
      the genotype data, if you wish to proceed.
No.
R/qtl has no special facilities for dealing with advanced intercross lines. One might analyze such data as if they were from an intercross, though with an expanded genetic map, but it is important to take account of the relationships among individuals (for example, the sibships in the final generation), and R/qtl is not currently able to do that.
No.  In the analysis of intercross data, we always consider the full model
      (allowing the three genotypes to have different phenotype averages).
      One may inspect the results of effectplot
      to assess whether a locus appears to be dominant or additive.
  
No, though one may inspect the results of effectplot
      which may suggest such an effect.  We see little value in a formal
      significance test.
  
One may use fitqtl to fit a
      multiple-QTL model and estimate the percent phenotypic variance
      explained by each QTL.
      
In the context of a single-QTL model, the heritability due to a QTL
      may be estimated by 1  10-2 LOD / n, where n is the
      sample size and LOD is the LOD score (from scanone).
  
We generally use 1000 permutation replicates, though we may use 10,000 or 100,000, if we want more precise results.
In general, we view the permutation test as a method for estimating a p-value. Suppose that the true p-value (if one performed all possible permutations) is p, we use n permutation replicates, and x is the number of replicates giving a LOD score greater or equal to that observed. Then x follows a binomial(n, p) distribution. Our estimate of the p-value is x/n, and this has standard error (SE) = √[p(1p)/n].
If one wishes the SE of the estimated p-value to be ∼0.001 in the case that p ≈ 0.05, one would need 0.05 × 0.95 / 0.0012 = 47,500 permutation replicates.
No.
Yes.  Use model="binary" in scanone or
      scantwo.
      Alternatively, created a dummy marker with the genotypes encoding the
      phenotypes, and use est.rf
      to calculate LOD scores for linkage between each typed marker
      and the phenotype.
  
Currently, the analysis of a binary phenotype in R/qtl requires
      genotype data on both affected and unaffected individuals.  In the
      case that genotype data are available only on affected individuals,
      one may use geno.table
      to identify loci that exhibit segregation distortion and so are
      indicated to be potentially linked to a disease susceptibility locus.
      Such evidence should be confirmed by further genotyping unaffected
      individuals.
  
scanone?
      It is best not to rely on the results of scanone
      to infer the presence of multiple linked QTL.  Instead, one
      should consider the results of a two-dimensional, two-QTL scan (with scantwo) or
      multiple QTL analysis (with fitqtl and/or scanqtl).
      
Nevertheless, if there are a couple of peaks on a chromosome, and one
      wishes to identify the location of the second peak, one can subset the
      results from scanone
      to find the location of the second peak.  For example, if out
      contains the output from scanone,
      and one wishes to find the location for the peak on chromosome 1 that
      is distal to 50 cM on the genetic map, one may use code like the
      following.
      
max(out[out$chr==1 & out$pos > 50,])
  
Use the function strip.partials.
  
The simplest approach is to consider a marker (preferably one with complete genotype data) near the position of interest, and perform a genome scan with that marker as first an additive and then an interactive covariate. The difference between the two sets of LOD scores concern evidence for interaction with the marker position.
Alternatively, one can use makeqtl and then
      addqtl, using either Haley-Knott regression or
      multiple imputaton.  See the following code, for interactions
      with the locus at 18 cM on chromosome 15 in the
      hyper data.
data(hyper)
hyper <- calc.genoprob(hyper, step=2.5, err=0.001)
qtl <- makeqtl(hyper, chr=15, pos=18, what="prob")
out.i <- addqtl(hyper, qtl=qtl, formula=y~Q1*Q2, method="hk")
out.a <- addqtl(hyper, qtl=qtl, formula=y~Q1+Q2, method="hk")
plot(out.i - out.a)
      
The code above uses Haley-Knott regression; to use multiple imputation, do the following.
data(hyper)
hyper <- sim.geno(hyper, step=2.5, n.draws=256, err=0.001)
qtl <- makeqtl(hyper, chr=15, pos=18, what="draws")
out.i <- addqtl(hyper, qtl=qtl, formula=y~Q1*Q2, method="imp")
out.a <- addqtl(hyper, qtl=qtl, formula=y~Q1+Q2, method="imp")
plot(out.i - out.a)
  
scantwo
      restricted to an interval?
      No, but one may use scanqtl
      to perform a two-dimensional, two-QTL scan in a given interval.
  
One may use the function c.cross to combine multiple
      backcrosses and/or intercrosses, provided that they have the same
      genetic maps.  This should be done after running calc.genoprob
      or sim.geno
      The combined analysis of multiple crosses requires care and is beyond
      the scope of this book.
  
In the context of a single phenotype, one cannot fruitfully apply the false discovery rate idea to QTL mapping. If one views as the set of null hypotheses that individual loci are not linked to any QTL, one really has just one null hypothesis per chromosome, and so a total of 20 null hypotheses for the mouse genome.
No.
The results of QTL analysis depend critically on the order of the genetic markers, and so knowledge of the physical locations of markers will be useful. However, calculations of conditional QTL genotype probabilities, given the available marker data, must rely on estimates of the recombination fractions between markers, which may only be obtained from a genetic map. Physical distances between markers are not a good substitute for genetic distances.
In general, one should use a map function that best reflects the level of crossover interference. However, QTL mapping calculations still generally rely on an assumption of no crossover interference; a map function is used only to convert genetic distances into recombination fractions.
The choice of map function seldom has much effect on the QTL mapping results, particularly in the case that the genetic markers are relatively dense and the genotype data are relatively complete. If one uses, for the analysis, a genetic map that was estimated from the same data, we recommend use of the same map function for both the estimation of the genetic map and the QTL mapping analysis; the choice of map function will have little impact on the results.
QTL analyses are generally conditional on the observed marker genotype
      data, and so results are little affected by the presence of
      segregation distoortion.  The reconstruction of genotypes at putative
      QTL relies on an assumption of no segregation distortion, but with
      reasonably dense markers and reasonably complete genotype data, this
      will not be a concern.  Segregation distortion may result in reduced
      power to identify QTL, but it should not lead to spurious evidence for
      QTL.  And so, while one should investigate the possibility of
      segregation distortion (for example, with geno.table),
      as it may indicate genotyping problems, one need not be
      concerned about the influence of true segregation distortion on
      the QTL mapping results.
  
There are several facilities for constructing genetic maps de novo in R/qtl.
First, import the data as if all markers are on one chromosome.
Use est.rf
      to estimate the pairwise marker recombination fractions and then
      formLinkageGroups
      to partition the markers into linkage groups.
      
Use orderMarkers
      to get initial marker orders for each linkage group, and then ripple
      to establish study alternate orders of markers within each
      linkage group.
      
The other tools you'll be wanting are:
est.map
    replace.map
    switch.order
    movemarker
      
scanone (or scantwo),
  sometimes I get the warning message:
X'X matrix is singular
Should I worry about this?
That warning message is saying that one of the many linear
regression fits was over-specified. That may happen if one of the
possible genotypes is missing, particularly in the iterative methods
(such as method="em" and method="ehk").
You can generally ignore this. It usually happens in regions with little evidence for a QTL. Sometimes you'll get spuriously large LOD scores (> 100) in these situations, in which case the warning may help to explain such artifacts.
cim(), I get
  slightly different results.Prior to running forward selection at markers to identify
  a set of marker covariates, cim() will use a single
  random imputation to fill in any missing genotype data at
  markers. This can lead to some randomness in the selection of marker
  covariates and to so in the cim() results.
[ Home | Download | News | Bugs | Sample graphics | Sample data | Tutorials | Book | Manual | Citation ]