fullsibQTL is an R package to perform QTL mapping in outbred/outcrossing species using the model develop by Gazaffi et al. 2014. We consider as a mapping population a full-sib progeny (or F1 population) derived by a bi-parental cross between two non-homozygous parents, with a genetic map obtained from markers possibly showing different segregation patterns. We assumed that the phenotypic traits are continuous and have a normal distribution. Also, the genetic map must be previously made using the R package onemap.

If you are not familiar with onemap and its approach to build a map, we strongly encourage you to read onemap vignettes (Margarido et al., 2007).

In this tutorial, to illustrate the usage of fullsibQTL we analyze the dataset present by Gazaffi et al. (2014) when proposing the method. The purpose is to help users dealing with this package and understanding the outputs. It is not our intention to teach the basis doing a QTL analysis, so for a better understand about the method see Gazaffi et al. (2014).

Users of fullsibQTL are supposed to have some experience with R, since the analysis is done using the command line. Previous knowledge of onemap is also desirably, once the genetic map is obtained with this software. But by no means one needs to be on expert on any of them to use the package.

fullsibQTL is available under the GNU General Public License, it is open-source and the code can be changed freely. It comes with no warranty. It is implemented as a package to be used under the freely distributed R software, which is a language and environment for statistical computing

If you are not familiar with R, we recommend reading the vignette Introduction to R. If you are not familiar with onemap, we recommend the reading of vignette How to build a linkage map for outcrossing populations.

Moreover, there are three additional tutorials:

Citation

To cite the model for QTL mapping in a full-sib progeny:

Gazaffi, R., Margarido, G. R., Pastina, M. M., Mollinari, M., Garcia, A. A. F. (2014). A model for quantitative trait loci mapping, linkage phase, and segregation pattern estimation for a full-sib progeny. Tree Genetics & Genomes, 10(4), 791-801. (https://link.springer.com/article/10.1007/s11295-013-0664-2)

To cite this R package:

Article in submission

Installation

Within R:

## Installing and loading devtools
install.packages( "devtools" )
library( "devtools" )

## Installing  fullsibQTL from github 
install_github( "augusto-garcia/fullsibQTL" )

Functions

The following table presents the fullsibQTL functions which can be directly accessed by users:

Type	Name	Description
input	`read_outcross_pheno`	Reads the data file containing markers and phenotypic values
	`create_fullsib`	Creates the object to perform QTL
interval mapping	`im_scan`	Scan the genome using IM approach
	`im_char`	Provides the genetic effects, LOD Score and the p-values of complementary testes
cofactors	`cof_selection`	Selects cofactors using multiple linear regression
	`cof_definition`	Defines locations on the genome to be used as cofactors
composite interval mapping	`cim_scan`	Scan the genome using CIM approach
	`cim_char`	Provides the genetic effects, LOD Score and the p-values of complementary testes
summaries (S3 Methods)	`print`	class `fullsib`: Prints a summary for the object of class fullsib
	`print`	class `fullsib_scan`: Prints the result of im_scan or cim_scan
	`summary`	class `fullsib_scan`: Summarizes the QTL search for im_scan or cim_scan
	`plot`	class `fullsib_scan`: Plot the QTL profile for the mapped groups
	`summary`	class `fullsib_perm`: Provides the threshold values for permutations test
	`plot`	class `fullsib_perm`: Plot the empiral distribution for permutations test
	`draw_phase`	Returns the linkage phase between QTL and markers
	`get_segr`	Returns the QTL segregation
	`r2_ls`	Provides the phenotypic proportion of each QTL mapped and altogether (least square estimation)

Required data

The input file of fullsibQTL is the same as the one required by onemap package. If you have VCF or MAPMAKER format files, onemap provides functions to read (onemap_read_vcfR and read_mapmaker) and export (write_onemap_raw) them as the onemap raw file. There are details about it at onemap vignette.

Below there is an example of such a file. The header line indicates the data type, the following line with 10 5 0 0 2 indicates that the data has, respectively, 10 individuals, 5 markers, no chromosome information for these markers (indicated by 0), no physical position information for the markers (0), and two phenotypic traits (0). This file is somehow similar to a MAPMAKER/EXP file (used a long time ago), but has additional information about the crosstype. It uses the symbol - for missing data.

data type outcross
10 5 0 0 2
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
*M1 B3.7 ab ab - ab b ab ab - ab b
*M2 D2.18 o - a a - o a - o o
*M3 D1.13 o a a o o - a o a o
*M4 A.4 ab b - ab a b ab b - a
*M5 D2.18 a a o - o o a o o o
*pheno1 4.8 2.1 10.6 3.7 -3.7 -7.5 - 3.9 3.6 -5.8
*pheno2 5 -5.8 - 14.8 -2.4 -1.9 -1.1 8.1 -11.1 4.8

If your onemap raw data do not have the phenotypic information, you can also provide it separately in a matrix with individuals in rows and traits in columns. See example:

pheno <- read.table(system.file("extdata/pheno_example.txt", package = "fullsibQTL"), 
                    header = T)

head(pheno)
#>    ind Trait1 Trait2
#> 1 IND1  4.985  5.985
#> 2 IND2  4.616  5.616
#> 3 IND3 -0.133  0.867
#> 4 IND4 -5.002 -4.002
#> 5 IND5  6.815  7.815
#> 6 IND6  2.678  3.678

If one wants to see an example of the phenotypic data file format, type the following, that will show the location of a file that is distributed with the package:

system.file( "extdata", "pheno_example.txt", package = "fullsibQTL" )

The individuals in the rows must be the same as the ones used to build the linkage map, they are specified in the third row of onemap raw file. If you do not have phenotypic information for all of them, complete their rows with NA.

Creating the working object

Once the file with raw data was created, it is necessary to import it and provide more details required for QTL mapping.

First you need to read the raw data with onemap::read_onemap function. After, you need to combine it with another object with information of linkages groups, created with function create_fullsib.

This is illustrated in the following example, where we import the raw data and a list with four linkages groups, with distances measure with
kosambi mapping function (Kosambi, 1944) (argument map.function). Note that we also specified argument step with value 1. This option is used for the computation of the conditional multipoint probability of QTL genotypes, which is used for QTL mapping (see Gazaffi et al. 2014 for details). If step=0 is used, the probabilities will be computed only in the position of each marker (allowing the analysis even with missing data).

library(fullsibQTL)

# Raw data
fs_data <- read_onemap( dir = "C:/workingdirectory", inputfile = "filename.raw")

## Information of linkage groups, step and mapping function
fsib <- create_fullsib( fs_data,
                        map.list = list( LG1_final, LG2_final, LG3_final, LG4_final),
                        step = 1, map.function = "kosambi")

# Or
## Provide the phenotypic information separately
fsib <- create_fullsib( fs_data,
                        map.list = list( LG1_final, LG2_final, LG3_final, LG4_final),
                        step = 1, map.function = "kosambi", pheno=pheno)

After loading the data, the user can look at the onemap object’s summary:

fs_data

For fullsibQTL object’s:

fsib

If one wants to see an example of the data file format (raw data), type the following, that will show the location of a file that is distributed with the package:

system.file( "extdata", "example_QTLfullsib.raw", package = "fullsibQTL" )

#> Loading required package: onemap
#> Warning in rgl.init(initValue, onlyNULL): RGL: unable to open X11 display
#> Warning: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'.

In this tutorial we will use as an example an R object distributed with the package. It has: a) raw data in the onemap format (example_QTLfullsib); b) four ordered linkage groups (LG1_final, LG2_final, LG3_final, and LG4_final); two objects with permutations, that will be explained later (named im_perm and cim_perm). They are the same ones described at Gazaffi et al. (2014).

To load the example data:

## Loading the example data
data("example_QTLfullsib") 

## Checking if it worked
ls() 
#> [1] "cim_perm"           "example_QTLfullsib" "im_perm"           
#> [4] "LG1_final"          "LG2_final"          "LG3_final"         
#> [7] "LG4_final"

## Printing some of the loaded objects
example_QTLfullsib
LG1_final # If you found an error here, don't worry. It is because of onemap version incompatibility, we automatically fix it in the next steps

Again, the map was built with onemap package. Now, we have to combine the raw data with the linkage groups (hereafter named LG) into a single object. This is done with function create_fullsib:

fsib <- create_fullsib( example_QTLfullsib,
                        map.list = list( LG1_final, LG2_final, LG3_final, LG4_final ),
                        step = 1, map.function = "kosambi" )

Checking it:

fsib
#>   This is an object of class 'fullsib'
#> 
#>   The linkage map has 4 groups, with 560 cM and 60 markers
#>   No. individuals genotyped:  300 
#>   Group 1 : 140 cM,  15 markers (A, B1, B3, C, D1, D2) 
#>   Group 2 : 140 cM,  15 markers (A, B1, B2, B3, C, D1, D2) 
#>   Group 3 : 140 cM,  15 markers (A, B2, B3, C, D1, D2) 
#>   Group 4 : 140 cM,  15 markers (A, B1, B2, C, D1, D2) 
#>   And 5 unlinked markers
#> 
#> 
#>   2 phenotypes are avaliable for QTL mapping
#>   Multipoint probability for QTL genotype was obtained for each 1 cM

This object contains a genetic map composed by four LGs, with their length and segregation patterns indicated in the output. There are also five unlinked makers. This might be helpful for further QTL analysis for species where the map does not cover the whole genome, using the single marker analysis. The object also has information for two phenotypes, with conditional probabilities calculated every 1 cM (using Kosambi function).

Interval mapping

To perform interval mapping, one should use two functions, im_scan and im_char. im_scan scans all the genome testing for statistical associations between genotype and phenotype, i.e., trying to detect QTL. im_char shows the genetic effects and their significance tests, as well as some additional tests to infer QTL segregation pattern.

IM Genome scan

To perform a genome scan:

im <- im_scan( fullsib = fsib, lg = "all", pheno.col = 1, LOD = TRUE )

Argument fullsib refers to the object created using function create_fullsib; lg indicates which linkage groups must be scanned (another way of representing it is lg=1:4). The function default is lg=all, to scan every LG. The argument pheno.col indicates which phenotype will be considered (it defaults to 1, the first trait), using an integer. The argument LOD=TRUE indicates that the mapping result must be shown using LOD scores, if FALSE, the scale will be shown as \(-log_{10}(p_{value})\).

The mapping result can be viewed by typing the object name:

im
#>        lg pos.cM        LOD model
#> M1      1      0 4.91459840     0
#> loc1    1      1 4.97917274     0
#> loc2    1      2 5.02580977     0
#> loc3    1      3 5.05401343     0
#> loc4    1      4 5.06380070     0
#> loc5    1      5 5.05567153     0
#> loc6    1      6 5.03060917     0
#> loc7    1      7 4.99010241     0
#> loc8    1      8 4.93589188     0
#> M2      1     10 4.79637977     0
#> loc10   1     10 4.79637977     0
#> loc11   1     11 4.78808670     0
#>  [ reached getOption("max.print") -- omitted 548 rows ]

Another way to see the results is using print and specifying the LG:

print( im, lg = 1 )
#>        lg pos.cM       LOD model
#> M1      1      0 4.9145984     0
#> loc1    1      1 4.9791727     0
#> loc2    1      2 5.0258098     0
#> loc3    1      3 5.0540134     0
#> loc4    1      4 5.0638007     0
#> loc5    1      5 5.0556715     0
#> loc6    1      6 5.0306092     0
#> loc7    1      7 4.9901024     0
#> loc8    1      8 4.9358919     0
#> M2      1     10 4.7963798     0
#> loc10   1     10 4.7963798     0
#> loc11   1     11 4.7880867     0
#>  [ reached getOption("max.print") -- omitted 128 rows ]

The resulting object of im_scan is a matrix with five columns and k lines, where k is the number of genome positions in which the conditional probabilities have been computed based on the step argument of the earlier create_fullsib function. In this object, the row names are the identification of each position (it can be the marker name itself or a pseudo-marker, with an automatically created name in the format loc + number (corresponding to a step between markers). The second column is the linkage group, and the third is the position (in centiMorgan). The fourth column has the LOD Score value (or the \(-log_{10}(p_{value})\)). The fifth column reefers to the model utilized for the computations (that will be detailed later).

For helping visualization of the results of QTL mapping, function summary(im) can be used to print just the highest LOD score value (or the \(-log_{10}(p_{value})\)) for every LG. The user can set a threshold (thr argument) to see only the scores above this. When there is more than one QTL in the same LG, the user has to manually look for it using some graphical visualization.

summary(im)
#>       lg pos.cM      LOD model
#> loc4   1      4 5.063801     0
#> M27    2    110 8.381168     0
#> loc64  3     64 6.343023     0
#> loc56  4     56 6.733203     0

QTL mapping plot options

For graphical visualizations, there are two functions: plot and plot_fullsibQTL. Function plot is simpler and it is a good option for quick observations. Function plot_fullsibQTL is recommended for more detailed analysis; it has also an option for interactive analysis, allowing tools such as mouse zoom and hover.

To use function plot:

plot(im)

Users can customize the output. For example, it is possible to plot in the same panel up to five mapping profiles. Moreover, the user can indicate the molecular marker’s position, change the labels, among others features. You can find more information about available options typing ?plot.fullsib_scan or on its vignette, typing

vignette("Graphical_options_with_plot" ,package="fullsibQTL")

Function plot_fullsibQTL is more sophisticated and interactive. Try it with

plot_fullsibQTL(fullsib = fsib, fullsib.scan = im)

To plot interactive graphics:

plot_fullsibQTL( fullsib = fsib, fullsib.scan = im, interact = TRUE )

It will generate this interactive html graphic.

You can find more information about plot_fullsibQTL in the vignette:

vignette("Graphical_options_with_plot_fullsibQTL" ,package="fullsibQTL")

Covariates

To improve the mapping statistical power, it is possible to include covariates (as fixed effects) in the QTL model, in order to control the phenotypic variability caused by known sources other than genetic (e.g. some known plants did not receive the correctly irrigation). To set a full-rank matrix which controls the phenotypic variance, first you need to define an incidence vector, and, then, set it as an argument of the function im_scan. An example follows:

covar <- matrix( rep( c( 1 ,-1 ), each = 150 ), ncol = 1 )
im_covar <- im_scan( fsib, pheno.col = 1, addcovar = covar )

IM Permutation test

The threshold for QTL detection can be set using permutation tests (Churchill and Doerge, 1994) adding the argument n.perm in the im_scan function. As the permutation test is a sampling process, it is recommended that users set a seed before the permutations, if they want to repeat the procedure. The argument write.perm allows the user to save a text file with information about the permutations.

In this tutorial, the permutation test object im_perm with the seed 1234 is already created and is part of the example_QTLfullsib object (previously loaded), so there is no need to run the following lines. Anyway, an example of how to perform the permutation is the following:

## Perfoming the permutation
set.seed(1234)
im_perm <- im_scan( fsib, pheno.col = 1, n.perm = 1000,
                    write.perm = "im_permutations.txt" )

If one needs to safe keep the object with the results of permutations, for further analysis, the object can be saved as an Rdata object. To save and load it, use:

save( im_perm, file = "im_permutations.Rdata" )
load( "im_permutations.Rdata" )

The permutation test demands some time. At the end of the process, im_perm receives a matrix with p lines, where p is the number of the realized permutations, and two columns with the highest and second highest peak values. The definition of the second peak is the same as the one defined by Chen and Storey (2006), i.e, the highest LOD score (or \(-log_{10}(p_{value})\)) of the genome disregarding the LG where the highest peak was achieved. The threshold values can be visualized with function summary applied to the object with the permutation tests. In this case, the user can set an \(\alpha\) significance threshold. The result will be the quantile \((1-\alpha) \times 100\) of the samples. The function default is to show \(\alpha\) values equal to \(0.10\) and \(0.05\).

## Summary of the permutation test
summary( im_perm )
#>  Threshold considering 1000 permutations
#>  First column indicates 'classical' threshold showed by Churchill and Doerge, 1994
#>  Second column means threshold values suggested by Chen and Storey, 2006
#>        peak.1   peak.2
#> 0.05 3.419819 2.497431
#> 0.1  3.103702 2.287990
summary( im_perm, alpha = 0.05 )
#>  Threshold considering 1000 permutations
#>  First column indicates 'classical' threshold showed by Churchill and Doerge, 1994
#>  Second column means threshold values suggested by Chen and Storey, 2006
#>        peak.1   peak.2
#> 0.05 3.419819 2.497431

## Getting the peaks for alpha 0.05.
im.peak1 <- summary(im_perm, alpha = 0.05)[1, 1]
#>  Threshold considering 1000 permutations
#>  First column indicates 'classical' threshold showed by Churchill and Doerge, 1994
#>  Second column means threshold values suggested by Chen and Storey, 2006
im.peak1
#> [1] 3.419819
im.peak2 <- summary(im_perm,alpha = 0.05)[1, 2]
#>  Threshold considering 1000 permutations
#>  First column indicates 'classical' threshold showed by Churchill and Doerge, 1994
#>  Second column means threshold values suggested by Chen and Storey, 2006
im.peak2
#> [1] 2.497431

## Plotting the second peak in the graphic
plot(im)
abline(h = im.peak2, col = "red")


## Plotting both thresholds
plot( im, lty = 1, lwd = 2, incl.mkr = NULL, cex.incl = 0.7, 
      cex.axis = 0.8, col = "blue", ylab = "LOD Score", xlab = "Linkage Group", 
      main = "IM - Trait 1" )
abline( h = im.peak1, lty = 2, lwd = 1, col = "black" )
abline( h = im.peak2, lty = 2, lwd = 2, col = "black" )

If one wants to see the distribution of maximum values (selected for each permutation), function plot can be applied to perm object. If peak argument is set as 2, the distribution of the second highest peaks will be showed. Vertical lines represent the cut-off threshold for \(\alpha=0.05\).

plot(im_perm, peak = 1)
abline(v = im.peak1)
abline(v = im.peak1, col = "red")


plot(im_perm, peak = 2)
abline(v = 3.419819, col = "blue")
abline(v = 2.497431, col = "red")

IM QTL characterization

After the determination of the number and position of the QTLs, the user can characterize the mapped QTLs. This relates to identify the significant effects, the segregation pattern and linkage phases between QTLs and flanking markers. The already created object im shows the presence of at least 4 QTLs:

summary(im)
#>       lg pos.cM      LOD model
#> loc4   1      4 5.063801     0
#> M27    2    110 8.381168     0
#> loc64  3     64 6.343023     0
#> loc56  4     56 6.733203     0

The characterization of these 4 regions is achieved with function im_char:

qtl1 <- im_char( fsib, pheno.col = 1, lg = 1, pos = "loc4" )
qtl2 <- im_char( fsib, pheno.col = 1, lg = 2, pos = "M27" )
qtl3 <- im_char( fsib, pheno.col = 1, lg = 3, pos = "loc64" )
qtl4 <- im_char( fsib, pheno.col = 1, lg = 4, pos = "loc56" )

Notice that positions “loc4”, “M27”, “loc64”, and “loc56” are the ones presented in the im object and visualized when function summary was applied. For example, the object with the characterization of the QTL on LG 1 is:

qtl1
#>                    M1-M2
#> LG            1.00000000
#> pos           4.00000000
#> -log10(pval)  4.46566439
#> LOD_Ha        5.06392065
#> mu           -0.12075333
#> alpha_p       1.85844505
#> LOD_H1        4.34832177
#> alpha_q       0.43196490
#> LOD_H2        0.23481276
#> delta_pq     -0.51729787
#> LOD_H3        0.29268801
#> H4_pvalue     0.01402575
#> H5_pvalue     0.03295641
#> H6_pvalue     0.89215754
#> model         0.00000000
#> attr(,"class")
#> [1] "fullsib_char" "matrix"

Object qtl1 (a matrix) has several information: * Line 1: LG information; * Line 2: position in centiMorgan where the QTL is localized; * Line 3: \(-log_{10}(p_{value})\)); * Line 4: LOD Score; * Line 5: intercept (mu); * Lines 6, 8 and 10: alpha.p, alpha.q, and delta.pq represent the QTL genetic effects; * Lines 7, 9 and 11: LOD.alpha.p, LOD.alpha.q, and LOD.delta.pq are the statistics associated with the marginal tests of significance of these genetic effects, with 1 degree of freedom (testing just if the effect differs from zero). In this case, a LOD score above or equal \(0.83\) (equivalent to a likelihood ratio test of LRT \(= 3.81\), which has an approximate \(\chi^2\) distribution with 1 degree of freedom under the null hypothesis). Since this is done only on specific positions, it is not necessary to correct for multiple tests. * Lines 12, 13 and 14: H4.pvalue, H5.pvalue, and H6.pvalue show the p-values for additional tests necessary to infer QTL segregation. For more information, please see Gazaffi et al. (2014). * Line 15 (last line): model shows which model was considered; please see details on the vignette QTL mapping with partially informative markers.

In some cases, a given test cannot be performed, showing NA as their values. In short, outcrossing species can have a combination of markers with different segregation patterns, and in some genomic regions the amount of information can be different. For example, a genomic with markers segregating only in one of the parents (marker types D1 or D2) will not allow inference about QTL in the other parent, so the model needs to be adapted accordingly. For details see Gazaffi et al. (2014).

We can print information for all QTL simultaneously:

cbind( qtl1, qtl2, qtl3, qtl4 )
#>                    M1-M2           M27       M37-M38     M51-M52
#> LG            1.00000000   2.000000000  3.0000000000  4.00000000
#> pos           4.00000000 109.999999871 64.0000000000 56.00000000
#> -log10(pval)  4.46566439   7.686445158  5.7006656820  6.07922441
#> LOD_Ha        5.06392065   8.383998018  6.3431104016  6.73344163
#> mu           -0.12075333  -0.120753333 -0.1207533333 -0.12075333
#> alpha_p       1.85844505   2.001547484  2.1843732488  1.60658603
#> LOD_H1        4.34832177   5.185292053  6.3174806045  3.55807185
#> alpha_q       0.43196490  -0.462907691  0.0954155095  1.55777619
#> LOD_H2        0.23481276   0.341625090  0.0121428301  3.18133346
#> delta_pq     -0.51729787  -1.715739744  0.1644144863  0.36960164
#> LOD_H3        0.29268801   3.887091384  0.0332241748  0.17050296
#> H4_pvalue     0.01402575   0.005461553  0.0001530791  0.92901795
#> H5_pvalue     0.03295641   0.597567873  0.0005160089  0.03000588
#> H6_pvalue     0.89215754   0.019293472  0.9112272264  0.05257837
#> model         0.00000000   0.000000000  0.0000000000  0.00000000

The analysis of the linkage phases allows to detect the localization of the QTL allele which increases or reduces the trait. To do so, look for LOD.alpha.p and LOD.alpha.q. In order to facilitate interpretation, the function draw_phase and can be used as follows:

draw_phase(fsib, qtl1, 0.05)
#> 
#> Printing QTL and its linkage phase between markers across the LG:
#> 
#> Markers          Position           Parent 1       Parent 2
#> 
#> M1                   0.00           a  |  | o        a  |  | o  
#> QTL                  4.00           P1 |  | P2       Q0 |  | Q0 
#> M2                  10.00           a  |  | b        c  |  | d  
#> M3                  20.00           c  |  | c        a  |  | b  
#> M4                  30.00           a  |  | o        a  |  | o  
#> M5                  40.00           a  |  | b        a  |  | b  
#> M6                  50.00           a  |  | b        a  |  | b  
#> M7                  60.00           a  |  | o        a  |  | o  
#> M8                  70.00           a  |  | o        a  |  | o  
#> M9                  80.00           a  |  | b        c  |  | d  
#> M10                 90.00           a  |  | b        c  |  | c  
#> M11                100.00           a  |  | b        c  |  | d  
#> M12                110.00           a  |  | b        a  |  | o  
#> M13                120.00           a  |  | b        c  |  | c  
#> M14                130.00           a  |  | b        c  |  | d  
#> M15                140.00           a  |  | b        c  |  | d  
#> 
#> P1 and Q1 have positive effect (increase phenotypic value)
#> P2 and Q2 have negative effect (reduce phenotypic value) 
#> P0 and Q0 have neutral effect (non signif.)

The value \(0.05\) relates to \(\alpha\) (significance level) and it is used to show only QTL statistically significant at this level. Alleles \(P1\) and \(Q1\) have positive effects, while \(P2\) and \(Q2\) have negative ones; \(P0\) and \(Q0\) have no significant effects.

To infer the QTL segregation, the user should use function get_segr function. It uses the default of \(\alpha=0.05\) for the significance level of marginal tests (LOD.alpha.p, LOD.alpha.q, LOD.delta.pq) and complementary ones (H4.pvalue, H5.pvalue, H6.pvalue). For example:

get_segr( qtl1 )
#> QTL segregation is 1:1
#> [1] "1:1"
get_segr( qtl2 )
#> QTL segregation is 1:2:1
get_segr( qtl3 )
#> QTL segregation is 1:1
#> [1] "1:1"
get_segr( qtl4 )
#> QTL segregation is 1:2:1

Selection of Cofactors

Before proceeding to composite interval mapping (CIM), it is necessary to define which markers will be used as cofactors. They will be used to control variation outside the mapping interval. To do so, there are two functions: cof_selection and cof_definition.

Cofactor selection using multiple linear regression

Function cof_selection allows to select cofactors using multiple linear regression with the stepwise procedure. The criteria information is utilized to determine if a given cofactor is kept or not in the model. Basically, this function first prepares the data and then uses step from stats package. For more information, use ?cof_selection.

cofs_fs <- cof_selection( fsib, pheno.col = 1, k = log( 300 ), n.cofactor = 10 )

plot( cofs_fs )

Object fsib is the same one created using function create_fullsib; pheno.col indicates which phenotypic trait will be used in the analysis; k is the penalty for the information criteria computation. If k=2, it used Akaike Information Criteria; if k=log(n) (where n is the mapping population size), it uses the Bayesian Information Criteria. If the user wants to use another penalty values, we recommend to look at the QTL Cartographer v1.17 manual, page 76, and also the help of the R function step. In some situations, the process would recommend to keep several cofactors which can lead to overparametrization of the model. To control it, the argument n.cofactor allows to the user to restrict the number of selected cofactors up to a maximum, i.e, the proceeding to select cofactors stops when the number of cofactors reaches what is defined in n.cofactor (the default is n.cofactor = 10).

The following example shows how to proceed to select cofactors in a situation where the fixed covariates (see Covariates section) are set to control the character variability. To do it, we used the argument addcovar. For example:

covar <- matrix( rep( c( 1, -1 ), each = 150 ), ncol = 1 )
cofs_fs2 <- cof_selection( fsib, pheno.col = 1, addcovar = covar, 
                           k = 2, thres.effect = 0.05 )
#> Number of Cofactors selected: 1 ... 2 ... 3 ... 4 ... 5 ... 6 ... 7 ... 8 ... 9 ... 10 ... done

After the cofactor selection, thres.effect=0.05 is used to verify if there are remaining cofactors with no significant effects, and if this is the case, they are removed. It is worth noting that each cofactor can have up to three genetical effects (additive in both parents and dominance), but not all them might be statistically significant. The default of this argument is 1, i.e, there is no removal of non-significant effects. This could lead to overparametrization, so use it carefully. For more information, use (?cof_selection).

Cofactor ad-hoc definition

Users can also select the cofactors using and ad-hoc procedure. This is the case for example when someone knows that certain markers are supposed to be linked with QTL. To do so, it is necessary to build a matrix and then incorporate it in the cof_definition function:

cofs <- matrix(c("1","M2",
                 "2","M27",
                 "3","M33",
                 "4","M52"), 4, 2, byrow = TRUE )
cofs_fsdef <- cof_definition( fsib, pheno.col = 1, 
                              thres.effect = 0.05, cof.pos = cofs)

Composite Interval Mapping (CIM)

In order to proceed with composite interval mapping, there are two functions: cim_scan and cim_char. They have analogous options of functions im_scan and im_char. The first one scan all the genome computing associations between genotype and phenotype trying to detect QTL. The second shows the genetic effects and their statistical significance. It also shows additional tests to infer the QTL segregation pattern. The mapping procedure is performed in a similar way as already presented for im_scan. The inclusion of cofactors allows a better mapping of QTL, since this model has more statistical power than interval mapping. For more information, use ?cim_scan and ?cim_char.

CIM Genome scan

The genome scan can be performed with function cim_scan, indicating the cofactors object (class: fullsib_cofactors) previously created with cof_selection or cof_definition.

cim <- cim_scan( fullsib = cofs_fs, lg = "all", ws = 10, pheno.col = 1, LOD = TRUE )

Option ws (window size) is to indicate the size of a window outside the mapping interval to block the inclusion of cofactors. This is important to not markers linked with the QTL being mapped to be considered as other QTL.

Permutation test

The threshold for QTL detection can be set using permutation tests (Churchill and Doerge, 1994) adding the argument n.perm in the cim_scan function. As the permutation test is a re-sampling process, it is recommended that users set a seed before the permutations, in order to be able to replicate the results. Argument write.perm allows to save a text file with information about all the permutations. In this tutorial, the permutation test object cim_perm with the seed 4321 is already created and is part of the example_QTLfullsib object previously loaded, so it is not necessary to run the following lines. But just as an example:

set.seed( 4321 )
cim_perm <- cim_scan( fullsib = cofs_fs, lg = "all", pheno.col = 1, 
                      LOD = TRUE, n.perm = 1000,
                      write.perm = "cim_permutations.txt" )

To keep the results of permutations for later analysis, one just needs to save (and eventually load) an R object:

save( cim_perm, file = "cim_permutations.Rdata" )
load( "cim_permutations.Rdata" )

The permutation test demands some time. At the end of the process, cim_perm receives a matrix with p lines, where p is the number of the realized permutations, and 4 columns representing the first, second, third, and fourth highest peak value, respectively, for the statistic LOD (or \(-log_{10}(p_{value})\)). The definition of the second peak is the same of the method two of Chen and Storey (2006), i.e, the highest LOD score (or \(-log_{10}(p_{value})\)) of the genome disregarding the LG where the highest peak was achieved. The third and fourth peak definitions are analogous. This threshold values can be visualized through summary for the object with the permutation tests. In this case, the user can set an \(\alpha\) for the significance threshold. The result will be the quantile \((1-\alpha) \times 100\) of permutation samples for the statistic (defaults \(0.10\) an \(0.05\)).

## Summary of the permutation test
summary(cim_perm)
#>  Threshold considering 1000 permutations
#>        peak.1   peak.2   peak.3   peak.4
#> 0.05 4.023724 2.900906 2.300657 1.860476
#> 0.1  3.707559 2.650663 2.140063 1.692707
## just with alpha=0.05 
summary(cim_perm, alpha = 0.05)
#>  Threshold considering 1000 permutations
#>        peak.1   peak.2   peak.3   peak.4
#> 0.05 4.023724 2.900906 2.300657 1.860476

## Getting the peaks for alpha 0.05. 
cim.peak1 <- summary(cim_perm, alpha=0.05)[1,1]
#>  Threshold considering 1000 permutations
cim.peak1
#> [1] 4.023724
cim.peak2 <- summary(cim_perm, alpha=0.05)[1,2]
#>  Threshold considering 1000 permutations
cim.peak2
#> [1] 2.900906

## Ploting the thrshold with the second peak
plot(cim)
abline(h = cim.peak2, col = "red")


## Plotting both thresholds
plot(cim, lty = 1, lwd = 2, incl.mkr = NULL, cex.incl = 0.7, 
      cex.axis = 0.8, col = "red", ylab = "LOD Score", xlab = "Linkage Group", 
      main = "CIM - Trait 1")
abline(h = cim.peak1, lty = 2, lwd = 1, col = "black")
abline(h = cim.peak2, lty = 2, lwd = 2, col = "black")

If one wants to see the distribution of the maximum of the LODs, function plot can be used with perm object. If the peak argument is 2, the distribution of the second highest peaks will be showed. The vertical line represents the cut-off threshold for \(\alpha=0.05\).

plot(cim_perm, peak = 1)
abline(v = cim.peak1, col = "red")


plot(cim_perm, peak = 2)
abline( v = cim.peak2, col = "red" )

Characterization of mapped QTL with CIM

After the determination of the number and position of the QTLs, it is possible to characterize the QTL, i.e., identify their significant effects, segregation patterns and linkage phases with flanking markers.

The already created object cim shows the presence of at least 4 QTLs:

summary(cim)
#>       lg pos.cM       LOD model
#> loc13  1     13  6.420684     0
#> loc45  2     45 14.985451     0
#> loc65  3     65  9.367385     0
#> M52    4     60 11.397918     0

However, the following graphic shows that there is evidence for more QTL. This is because the automatic selection of peaks is not perfect, demanding some user interaction. We do not recommend using automatic procedures.

## Plotting the CIM results
plot( cim, lty = 1, lwd = 2, incl.mkr = NULL, cex.incl = 0.7, 
      cex.axis = 0.8, col = "red", ylab = "LOD Score", xlab = "Linkage Group", 
      main = "CIM - Trait 1" )
abline( h = cim.peak1, lty = 2, lwd = 1, col = "black" )
abline( h = cim.peak2, lty = 2, lwd = 2, col = "black" )

The location of other peaks (meaning evidence for more QTL) can be verified examining object cim. For example, for some LGs:

## Visual Analysis of every location of LG1
## QTLs peak at loc13 and M12
print(cim,lg = 1)
#>        lg pos.cM        LOD model
#> M1      1      0 5.79367870     0
#> loc1    1      1 5.91181131     0
#> loc2    1      2 6.01211556     0
#> loc3    1      3 6.09560281     0
#> loc4    1      4 6.16354985     0
#> loc5    1      5 6.21737749     0
#> loc6    1      6 6.25856640     0
#> loc7    1      7 6.28857607     0
#> loc8    1      8 6.30875904     0
#> M2      1     10 6.32409143     0
#> loc10   1     10 6.32409143     0
#> loc11   1     11 6.37420419     0
#> loc12   1     12 6.40668720     0
#> loc13   1     13 6.42068404     0
#> loc14   1     14 6.41583249     0
#> loc15   1     15 6.39210295     0
#> loc16   1     16 6.34966143     0
#> loc17   1     17 6.29271090     0
#> loc18   1     18 6.21525449     0
#> M3      1     20 0.08055465     0
#> loc20   1     20 0.08055465     0
#> loc21   1     21 0.18998040     0
#> loc22   1     22 0.29355109     0
#> loc23   1     23 0.40492461     0
#> loc24   1     24 0.48753060     0
#> loc25   1     25 0.56386105     0
#> loc26   1     26 0.63065734     0
#> loc27   1     27 0.68585027     0
#> loc28   1     28 0.72837864     0
#> M4      1     30 0.75643287     0
#> loc30   1     30 0.75643287     0
#> loc31   1     31 0.81806962     0
#> loc32   1     32 0.87103269     0
#> loc33   1     33 0.91419103     0
#> loc34   1     34 0.92929996     0
#> loc35   1     35 0.95314871     0
#> loc36   1     36 0.96357427     0
#> loc37   1     37 0.96005953     0
#> loc38   1     38 0.94264939     0
#> M5      1     40 0.86006181     0
#> loc40   1     40 0.86006181     0
#> loc41   1     41 0.81052668     0
#> loc42   1     42 0.75138109     0
#> loc43   1     43 0.68301187     0
#> loc44   1     44 0.60697044     0
#> loc45   1     45 0.52607856     0
#> loc46   1     46 0.44414778     0
#> loc47   1     47 0.36536289     0
#> loc48   1     48 0.28445785     0
#> M6      1     50 0.17439630     0
#> loc50   1     50 0.17439630     0
#> loc51   1     51 0.19465351     0
#> loc52   1     52 0.21951677     0
#> loc53   1     53 0.24871485     0
#> loc54   1     54 0.28182034     0
#> loc55   1     55 0.31827237     0
#> loc56   1     56 0.36712610     0
#> loc57   1     57 0.40889707     0
#> loc58   1     58 0.45165496     0
#> M7      1     60 0.53820711     0
#> loc60   1     60 0.53820711     0
#> loc61   1     61 0.51824229     0
#> loc62   1     62 0.49986308     0
#> loc63   1     63 0.48264112     0
#> loc64   1     64 0.46615181     0
#> loc65   1     65 0.45007353     0
#> loc66   1     66 0.43003498     0
#> loc67   1     67 0.41567820     0
#> loc68   1     68 0.40183289     0
#> M8      1     70 0.37670533     0
#> loc70   1     70 0.37670532     0
#> loc71   1     71 0.30090156     0
#> loc72   1     72 0.22973170     0
#> loc73   1     73 0.16654602     0
#> loc74   1     74 0.11464860     0
#> loc75   1     75 0.07679051     0
#> loc76   1     76 0.05468877     0
#> loc77   1     77 0.04871900     0
#> loc78   1     78 0.05790017     0
#> M9      1     80 0.11370984     0
#> loc80   1     80 0.11370985     0
#> loc81   1     81 0.13441827     0
#> loc82   1     82 0.16022553     0
#> loc83   1     83 0.19122816     0
#> loc84   1     84 0.22733276     0
#> loc85   1     85 0.26820134     0
#> loc86   1     86 0.32089346     0
#> loc87   1     87 0.37124990     0
#> loc88   1     88 0.42286918     0
#> M10     1     90 0.52258518     0
#> loc90   1     90 0.52258519     0
#> loc91   1     91 0.55666866     0
#> loc92   1     92 0.58802902     0
#> loc93   1     93 0.62657991     0
#> loc94   1     94 0.64883535     0
#> loc95   1     95 0.65984363     0
#> loc96   1     96 0.65719359     0
#> loc97   1     97 0.63961502     0
#> loc98   1     98 0.60733055     0
#> M11     1    100 0.50067092     0
#> loc100  1    100 0.50067092     0
#> loc101  1    101 0.52015751     0
#> loc102  1    102 0.54480557     0
#> loc103  1    103 0.54290181     0
#> loc104  1    104 0.52817260     0
#> loc105  1    105 0.52551697     0
#> loc106  1    106 0.50633660     0
#> loc107  1    107 0.49849389     0
#> loc108  1    108 0.51489059     0
#> M12     1    110 5.97066700     0
#> loc110  1    110 5.97066700     0
#> loc111  1    111 5.95459428     0
#> loc112  1    112 5.92220227     0
#> loc113  1    113 5.87235208     0
#> loc114  1    114 5.80413814     0
#> loc115  1    115 5.71724937     0
#> loc116  1    116 5.61223033     0
#> loc117  1    117 5.49053478     0
#> loc118  1    118 5.35431731     0
#> M13     1    120 5.04451339     0
#> loc120  1    120 5.04451338     0
#> loc121  1    121 4.94913994     0
#> loc122  1    122 4.85024247     0
#> loc123  1    123 4.74692594     0
#> loc124  1    124 4.63807846     0
#> loc125  1    125 4.52247566     0
#> loc126  1    126 4.39893440     0
#> loc127  1    127 4.26652408     0
#> loc128  1    128 4.12483679     0
#> M14     1    130 1.48723693     0
#> loc130  1    130 1.48723691     0
#> loc131  1    131 1.30635788     0
#> loc132  1    132 1.07946868     0
#> loc133  1    133 0.84706137     0
#> loc134  1    134 0.64011396     0
#> loc135  1    135 0.47685836     0
#> loc136  1    136 0.36142950     0
#> loc137  1    137 0.30583532     0
#> loc138  1    138 0.26798851     0
#> M15     1    140 0.29660051     0

## Visual Analysis LG2
## QTLs peaks at loc45, loc77, and loc 108
print(cim, lg = 2)

## Visual Analysis LG3
## QTLs peaks at M33 and loc65
print(cim,lg = 3)

## Visual Analysis LG4
## QTL peak at M52
print(cim,lg = 4)

Another way is by pointing the mouse for an interactive session of function plot_fullsibQTL:

plot_fullsibQTL( fullsib = fsib, fullsib.scan = cim, interact = TRUE )
# plot not shown

Through the visual analysis we found the following eight QTLs peaks corresponding to positions next to the cofactors.

print( cim, lg = 1, pos = c( "loc13", "M12", "loc45" ) )
#>       lg pos.cM       LOD model
#> loc13  1     13 6.4206840     0
#> loc45  1     45 0.5260786     0
#> M12    1    110 5.9706670     0

In a similar way:

print( cim, lg=2, pos = c( "loc45", "loc77", "loc108" ) )
print( cim, lg=3, pos = c( "M33", "loc65" ) )
print( cim, lg=4, pos = "M52" )

In order to estimate the total of proportion explained by the QTL (including the new ones manually selected) and its characterization, one can use function cim_char in the following way:

cim_QTL1_lg1 <- cim_char( fullsib = cofs_fs, lg = 1, pheno.col = 1, pos = "loc13" )
cim_QTL2_lg1 <- cim_char( fullsib = cofs_fs, lg = 1, pheno.col = 1, pos = "M12" )
cim_QTL1_lg2 <- cim_char( fullsib = cofs_fs, lg = 2, pheno.col = 1, pos = "loc45" )
cim_QTL2_lg2 <- cim_char( fullsib = cofs_fs, lg = 2, pheno.col = 1, pos = "loc77" )
cim_QTL3_lg2 <- cim_char( fullsib = cofs_fs, lg = 2, pheno.col = 1, pos = "loc108" )
cim_QTL1_lg3 <- cim_char( fullsib = cofs_fs, lg = 3, pheno.col = 1, pos = "M33" )
cim_QTL2_lg3 <- cim_char( fullsib = cofs_fs, lg = 3, pheno.col = 1, pos = "loc65")
cim_QTL1_lg4 <- cim_char( fullsib = cofs_fs, lg = 4, pheno.col = 1 , pos = "M52" )

## Printing the object
cim_QTL1_lg1
#>                     M2-M3
#> LG            1.000000000
#> pos          13.000000000
#> -log10(pval)  5.776307817
#> LOD_Ha        6.421161966
#> mu           -0.190090732
#> alpha_p       1.515466989
#> LOD_H1        5.739391706
#> alpha_q      -0.287266743
#> LOD_H2        0.217143175
#> delta_pq     -0.437730736
#> LOD_H3        0.459059356
#> H4_pvalue     0.003131291
#> H5_pvalue     0.010658793
#> H6_pvalue     0.709783594
#> model         0.000000000
#> attr(,"class")
#> [1] "fullsib_char" "matrix"

cim_QTL1_lg1 is a matrix with 1 column and 15 lines. The flanking markers of this specific QTL are in the header (M2-M3 in this case). Then, the first line has the LG information. The second line has the position in centiMorgan where the region is localized. The third and fourth lines shows, respectively, \(-log_{10}(p_{value})\)) and LOD Score. The fifth line (mu) contains the intercept effect of the model. The lines alpha.p, alpha.q, and delta.pq represent the QTL genetic effects. The lines LOD.H1, LOD.H2, and LOD.H3 are the statistics for the marginal tests of the respective genetic effects. These three tests have each 1 degree of freedom (testing if the effect are different from zero). In this case, a LOD score above \(0.83\) indicates a significant effect (since LOD \(= 0.83\) is equivalent as a LRT \(= 3.81\), which under H0 follows a \(\chi^2\) distribution with 1 degree of freedom; \(3.81\) is for a significance level of \(5%\)). The lines H4.pvalue, H5.pvalue, and H6.pvalue show the p-values for the complementary test in order to detect the QTL segregation. For more information on interpretations, please see Gazaffi et al. (2014). Finally, the line model shows which model was evaluated; see details on the vignette QTL mapping with partially informative markers . In some cases, a given test cannot be evaluated, and so the cells will have NA values.

We can also print all the QTLs in a single table:

## Printing in a single table
cbind( cim_QTL1_lg1, cim_QTL2_lg1, cim_QTL1_lg2, cim_QTL2_lg2, 
       cim_QTL3_lg2, cim_QTL1_lg3, cim_QTL2_lg3, cim_QTL1_lg4)
#>                     M2-M3           M12      M20-M21    M23-M24       M26-M27
#> LG            1.000000000  1.000000e+00  2.000000000  2.0000000  2.000000e+00
#> pos          13.000000000  1.100000e+02 45.000000000 77.0000000  1.080000e+02
#> -log10(pval)  5.776307817  5.342661e+00 14.175532464  8.4245522  9.761468e+00
#> LOD_Ha        6.421161966  5.973284e+00 14.987625223  9.1391751  1.050362e+01
#> mu           -0.190090732 -2.866931e-01 -0.249944242 -0.2020260 -2.141821e-01
#> alpha_p       1.515466989  1.105868e-01  2.018470697 -1.2669553  2.198974e+00
#> LOD_H1        5.739391706  3.563618e-02  6.928530566  2.2693891  6.722619e+00
#> alpha_q      -0.287266743 -1.471497e+00  0.688679248 -1.6428034  2.516817e-01
#> LOD_H2        0.217143175  5.281494e+00  0.884675396  3.0130132  1.119172e-01
#> delta_pq     -0.437730736  2.649273e-01 -1.846287237  1.2301934 -1.277437e+00
#> LOD_H3        0.459059356  1.834512e-01  6.990072004  2.3963207  3.521562e+00
#> H4_pvalue     0.003131291  6.668927e-04  0.008110192  0.5277092  1.447511e-04
#> H5_pvalue     0.010658793  7.090789e-01  0.724646833  0.9461134  8.033794e-02
#> H6_pvalue     0.709783594  7.237503e-03  0.014040379  0.5212641  4.101040e-02
#> model         0.000000000  0.000000e+00  0.000000000  0.0000000  0.000000e+00
#>                      M33       M37-M38          M52
#> LG            3.00000000  3.000000e+00  4.000000000
#> pos          19.99999998  6.500000e+01 59.999999930
#> -log10(pval)  5.33337949  8.649308e+00 10.639847207
#> LOD_Ha        5.96368600  9.368843e+00 11.398141754
#> mu           -0.13202914 -3.817750e-02 -0.029297883
#> alpha_p       0.99415759  2.046954e+00  1.411320036
#> LOD_H1        2.15964145  8.880128e+00  5.200490627
#> alpha_q      -1.30846024  2.340881e-01  1.536338315
#> LOD_H2        3.78326129  1.109314e-01  6.700681952
#> delta_pq     -0.49565965  4.211738e-01  0.298028729
#> LOD_H3        0.70778406  4.071333e-01  0.246488920
#> H4_pvalue     0.44260139  3.682693e-05  0.744413839
#> H5_pvalue     0.22998428  3.115411e-04  0.004033456
#> H6_pvalue     0.05329961  6.987909e-01  0.002403344
#> model         0.00000000  0.000000e+00  0.000000000

To evaluate QTL phases, we can use the get_segr function:

get_segr( cim_QTL1_lg1, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:1
#> [1] "1:1"
get_segr( cim_QTL2_lg1, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:1
#> [1] "1:1"
get_segr( cim_QTL1_lg2, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:2:1
get_segr( cim_QTL2_lg2, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 3:1
get_segr( cim_QTL3_lg2, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:2:1
get_segr( cim_QTL1_lg3, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:2:1
get_segr( cim_QTL2_lg3, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:1
#> [1] "1:1"
get_segr( cim_QTL1_lg4, probs1 = 0.05, probs2 = 0.05 )
#> QTL segregation is 1:2:1

And finally, to draw a visualization of the genetic map with QTLs and infered phases:

cim_QTL1_lg1_draw <- draw_phase(fullsib = cofs_fs, fullsib.char = cim_QTL1_lg1)

# Two ways of visualize
cim_QTL1_lg1_draw
#> 
#> Printing QTL and its linkage phase between markers across the LG:
#> 
#> Markers          Position           Parent 1       Parent 2
#> 
#> M1                   0.00           a  |  | o        a  |  | o  
#> M2                  10.00           a  |  | b        c  |  | d  
#> QTL                 13.00           P1 |  | P2       Q0 |  | Q0 
#> M3                  20.00           c  |  | c        a  |  | b  
#> M4                  30.00           a  |  | o        a  |  | o  
#> M5                  40.00           a  |  | b        a  |  | b  
#> M6                  50.00           a  |  | b        a  |  | b  
#> M7                  60.00           a  |  | o        a  |  | o  
#> M8                  70.00           a  |  | o        a  |  | o  
#> M9                  80.00           a  |  | b        c  |  | d  
#> M10                 90.00           a  |  | b        c  |  | c  
#> M11                100.00           a  |  | b        c  |  | d  
#> M12                110.00           a  |  | b        a  |  | o  
#> M13                120.00           a  |  | b        c  |  | c  
#> M14                130.00           a  |  | b        c  |  | d  
#> M15                140.00           a  |  | b        c  |  | d  
#> 
#> P1 and Q1 have positive effect (increase phenotypic value)
#> P2 and Q2 have negative effect (reduce phenotypic value) 
#> P0 and Q0 have neutral effect (non signif.)

plot_fullsib_phases(cim_QTL1_lg1_draw)


# If include fullsib.scan object you can also visualize the confidence interval considering 1 or 2 LOD
cim_QTL1_lg1_draw <- draw_phase(fullsib = cofs_fs, fullsib.char = cim_QTL1_lg1, fullsib.scan = cim)

plot_fullsib_phases(cim_QTL1_lg1_draw, CI = "LOD1")


# Too access the interval limits in cM
cim_QTL1_lg1_draw[2:3]
#> $LOD1
#> [1]  0 18
#> 
#> $LOD2
#> [1]  0 18

# You can also plot multiple QTLs of same group
cim_QTL2_lg1_draw <- draw_phase( cofs_fs, cim_QTL2_lg1 )
cim_QTL1_lg2_draw <- draw_phase( cofs_fs, cim_QTL1_lg2 )
cim_QTL2_lg2_draw <- draw_phase( cofs_fs, cim_QTL2_lg2 )
cim_QTL3_lg2_draw <- draw_phase( cofs_fs, cim_QTL3_lg2 )
cim_QTL1_lg3_draw <- draw_phase( cofs_fs, cim_QTL1_lg3 )
cim_QTL2_lg3_draw <- draw_phase( cofs_fs, cim_QTL2_lg3 )
cim_QTL1_lg4_draw <- draw_phase( cofs_fs, cim_QTL1_lg4 )

p1 <- plot_fullsib_phases(x = list(cim_QTL1_lg1_draw, 
                               cim_QTL2_lg1_draw))

p2 <- plot_fullsib_phases(list(cim_QTL1_lg2_draw, 
                               cim_QTL2_lg2_draw, 
                               cim_QTL3_lg2_draw))

p3 <- plot_fullsib_phases(list(cim_QTL1_lg3_draw, 
                               cim_QTL2_lg3_draw))

p4 <- plot_fullsib_phases(cim_QTL1_lg4_draw)

library(ggpubr)
#> Loading required package: ggplot2
ggarrange(p1,p2,p3,p4, common.legend = T)

Estimating QTL effects using least squares

Function r2_l2 calculates the proportion of phenotypic variation explained by the QTL in the model (altogether, denoted by R2.trait, and individually, indicated as R2.lgX.Y, where lgX.Y means the LG number \(X\) in the \(Y\)-th position). To do this, a least squares approximation is used (instead of the mixture model with the EM algorithm). This is good enough for interpretations, but much faster. For more information, use ?r2_l2.

In this example, for the model with all QTL mapped with CIM and without the cofactors (since they are only a strategy to control background variation during the mapping procedure):

## Estimating QTL effects without covariate
qtls.cim <- r2_ls( fsib, pheno.col = 1, lg = c( 1, 1, 2, 2, 2, 3, 3, 4 ),
                   pos = c( "loc13", "M12", "loc45", "loc77", "loc108", "M33", "loc65", "M52" ) )

If for some reason one wants also to consider the cofactors:

## Estimating QTL effects with covariates
covar <- matrix( rep( c( 1, -1 ), each = 150 ), ncol = 1 )
r2_ls( fsib, pheno.col = 2, addcovar = covar, lg = c( 1, 1, 2, 2, 2, 3, 3, 4 ),
       pos = c( "loc13", "M12", "loc45", "loc77", "loc108", "M33", "loc65", "M52" ) )
#>               lg    loc      r2
#> R2.trait      NA    All 57.4916
#> R2.lg1.loc13   1  loc13  4.4783
#> R2.lg1.M12     1    M12  3.2839
#> R2.lg2.loc45   2  loc45 11.9193
#> R2.lg2.loc77   2  loc77  7.5766
#> R2.lg2.loc108  2 loc108  9.0162
#> R2.lg3.M33     3    M33  4.5839
#> R2.lg3.loc65   3  loc65  6.2476
#> R2.lg4.M52     4    M52  8.4961

Graphical visualization of QTLs

It is possible to plot a useful visualization of the CIM indicating the r2ls.out object:

plot_fullsibQTL( fullsib = fsib,
                 fullsib.scan = cim,
                 r2ls.out = qtls.cim,
                 qtlmapping = "CIM",
                 thr = c( 4.023724, 2.900906 ) )

Reporting of Results

The following articles can be used as examples of how to report the QTL mapping results obtained with package fullsibQTL:

Conson ARO, Taniguti CH, Amadeu RR, Andreotti IAA., de Souza LM, dos Santos LHB, Rosa JRBF, Mantello CC, da Silva CC, José Scaloppi Junior E, Ribeiro RV, Le Guen V, Garcia AAF, Gonçalves P de S, & de Souza AP (2018). High-Resolution Genetic Map and QTL Analysis of Growth-Related Traits of Hevea brasiliensis Cultivated Under Suboptimal Temperature and Humidity Conditions. Frontiers in Plant Science, 9(August), 1–16. ISSN 1664-462X. https://doi.org/10.3389/fpls.2018.01255

Balsalobre TWA, da Silva Pereira G, Margarido GRA, Gazaffi R, Barreto FZ, Anoni CO, Cardoso-Silva CB, Costa EA, Mancini MC, Hoffmann HP, de Souza AP, Garcia AAF, Carneiro MS (2017). GBS-based single dosage markers for linkage and QTL mapping allow gene mining for yield-related traits in sugarcane. BMC Genomics, 18(1), 72. ISSN 1471-2164. https://doi.org/10.1186/s12864-016-3383-x.

Costa EA, Anoni CO, Mancini MC, Santos FRC, Marconi TG, Gazaffi R, Pastina MM, Perecin D, Mollinari M, Xavier MA, Pinto LR, Souza AP, Garcia AAF (2016). QTL mapping including codominant SNP markers with ploidy level information in a sugarcane progeny. Euphytica, 211(1), 1–16. ISSN 1573-5060. https://doi.org/10.1007/s10681-016-1746-7.

da Silva Pereira G, Di Cassia Laperuta L, Nunes ES, Chavarrı́a L, Pastina MM, Gazaffi R, Geraldi IO, Garcia AAF, Vieira MLC (2017). The Sweet Passion Fruit (Passiflora alata) Crop: Genetic and Phenotypic Parameter Estimates and QTL Mapping for Fruit Traits. Tropical Plant Biology, 10(1), 18–29. ISSN 1935-9764. https://doi.org/10.1007/s12042-016-9181-4.

Margarido GRA, Pastina MM, Souza AP, Garcia AAF (2015). Multi-trait multi-environment quantitative trait loci mapping for a sugarcane commercial cross provides insights on the inheritance of important traits. Molecular Breeding, 35(8), 175. ISSN 1572-9788. https://doi.org/10.1007/s11032-015-0366-6.

References

Belsley, D. A., Edwin K., and Roy E. W. (2005). Regression diagnostics: Identifying influential data and sources of collinearity. Vol. 571. John Wiley & Sons.

Cockerham, C. C. (1954). An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39(6), 859.

Chen, L., and Storey, J. D. (2006). Relaxed significance criteria for linkage analysis. Genetics, 173(4), 2371-2381.

Churchill, G. A., and Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics, 138(3), 963-971.

Grattapaglia, D., and Sederoff, R. (1994). Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics, 137(4), 1121-1137.

Margarido, G. R. A., Souza, A.P. and Garcia, A. A. F. OneMap: software for genetic mapping in outcrossing species. Hereditas 144, 78-79, 2007.

Wu, R., Ma, C.X., Painter, I. and Zeng, Z.-B. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theoretical Population Biology 61, 349-363, 2002a.

Wu, R., Ma, C.-X., Wu, S. S. and Zeng, Z.-B. Linkage mapping of sex-specific differences. Genetical Research 79, 85-96, 2002b.

Session Info

sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 10 (buster)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggpubr_0.4.0        ggplot2_3.3.2       fullsibQTL_0.0.9010
#> [4] onemap_2.2.0        rmdformats_0.3.7    knitr_1.29         
#> 
#> loaded via a namespace (and not attached):
#>   [1] colorspace_1.4-1          ggsignif_0.6.0           
#>   [3] ellipsis_0.3.1            class_7.3-15             
#>   [5] rio_0.5.16                htmlTable_2.0.1          
#>   [7] RcppArmadillo_0.9.900.1.0 base64enc_0.1-3          
#>   [9] rstudioapi_0.11           mice_3.10.0              
#>  [11] farver_2.0.3              codetools_0.2-16         
#>  [13] splines_3.6.2             doParallel_1.0.15        
#>  [15] Formula_1.2-3             polynom_1.4-0            
#>  [17] jsonlite_1.7.0            MDSMap_1.1               
#>  [19] broom_0.7.0               cluster_2.1.0            
#>  [21] png_0.1-7                 shiny_1.5.0              
#>  [23] httr_1.4.1                compiler_3.6.2           
#>  [25] backports_1.1.8           assertthat_0.2.1         
#>  [27] lazyeval_0.2.2            Matrix_1.2-18            
#>  [29] fastmap_1.0.1             later_1.1.0.1            
#>  [31] acepack_1.4.1             htmltools_0.5.0          
#>  [33] tools_3.6.2               gtable_0.3.0             
#>  [35] glue_1.4.1                reshape2_1.4.4           
#>  [37] dplyr_1.0.0               ggthemes_4.2.0           
#>  [39] Rcpp_1.0.5                carData_3.0-4            
#>  [41] cellranger_1.1.0          vctrs_0.3.1              
#>  [43] gdata_2.18.0              iterators_1.0.12         
#>  [45] crosstalk_1.1.0.1         xfun_0.15                
#>  [47] stringr_1.4.0             openxlsx_4.1.5           
#>  [49] mime_0.9                  miniUI_0.1.1.1           
#>  [51] lifecycle_0.2.0           weights_1.0.1            
#>  [53] gtools_3.8.2              rstatix_0.6.0            
#>  [55] princurve_2.1.4           candisc_0.8-3            
#>  [57] MASS_7.3-51.4             scales_1.1.1             
#>  [59] heplots_1.3-5             hms_0.5.3                
#>  [61] promises_1.1.1            parallel_3.6.2           
#>  [63] smacof_2.1-1              RColorBrewer_1.1-2       
#>  [65] yaml_2.2.1                curl_4.3                 
#>  [67] gridExtra_2.3             updog_1.2.0              
#>  [69] rpart_4.1-15              reshape_0.8.8            
#>  [71] latticeExtra_0.6-29       stringi_1.4.6            
#>  [73] foreach_1.5.0             plotrix_3.7-8            
#>  [75] e1071_1.7-3               checkmate_2.0.0          
#>  [77] zip_2.0.4                 manipulateWidget_0.10.1  
#>  [79] rlang_0.4.7               pkgconfig_2.0.3          
#>  [81] rgl_0.100.54              evaluate_0.14            
#>  [83] lattice_0.20-38           purrr_0.3.4              
#>  [85] labeling_0.3              htmlwidgets_1.5.1        
#>  [87] cowplot_1.1.0             tidyselect_1.1.0         
#>  [89] plyr_1.8.6                magrittr_1.5             
#>  [91] bookdown_0.21             R6_2.4.1                 
#>  [93] generics_0.0.2            nnls_1.4                 
#>  [95] Hmisc_4.4-0               withr_2.2.0              
#>  [97] pillar_1.4.6              haven_2.3.1              
#>  [99] foreign_0.8-72            survival_3.1-8           
#> [101] abind_1.4-5               nnet_7.3-12              
#> [103] tibble_3.0.3              crayon_1.3.4             
#> [105] car_3.0-8                 wordcloud_2.6            
#> [107] plotly_4.9.2.1            ellipse_0.4.2            
#> [109] rmarkdown_2.5             jpeg_0.1-8.1             
#> [111] grid_3.6.2                readxl_1.3.1             
#> [113] data.table_1.12.8         forcats_0.5.0            
#> [115] digest_0.6.25             webshot_0.5.2            
#> [117] xtable_1.8-4              tidyr_1.1.0              
#> [119] httpuv_1.5.4              munsell_0.5.0            
#> [121] viridisLite_0.3.0

`fullsibQTL` Tutorial

`fullsibQTL` Tutorial

Citation

Installation

Functions

Required data

Creating the working object

Interval mapping

IM Genome scan

QTL mapping plot options

Covariates

IM Permutation test

IM QTL characterization

Selection of Cofactors

Cofactor selection using multiple linear regression

Cofactor ad-hoc definition

Composite Interval Mapping (CIM)

CIM Genome scan

Permutation test

Characterization of mapped QTL with CIM

Estimating QTL effects using least squares

Graphical visualization of QTLs

Reporting of Results

References

Session Info

fullsibQTL Tutorial

fullsibQTL Tutorial

Citation

Installation

Functions

Required data

Creating the working object

Interval mapping

IM Genome scan

QTL mapping plot options

Covariates

IM Permutation test

IM QTL characterization

Selection of Cofactors

Cofactor selection using multiple linear regression

Cofactor ad-hoc definition

Composite Interval Mapping (CIM)

CIM Genome scan

Permutation test

Characterization of mapped QTL with CIM

Estimating QTL effects using least squares

Graphical visualization of QTLs

Reporting of Results

References

Session Info

`fullsibQTL` Tutorial

`fullsibQTL` Tutorial