CAVIARBF R version¶
Since the documentation of caviarbf is not clear enough and sometimes mixes with the documentation of the C++ version and the R version this documentation serves as a supplement
Dependencies¶
The R version depends on Rcpp
, RcppEigen
and glmnet
, remember to install these package before installing caviarbf
Warning
By 2018-06-28, there is an error when using the lastest version of glmnet
(2.0-16). According to the author's own explanation, "This is a problem with the latest glmnet update. So far glmnet 2.0-10 seems to be working. I will try to see whether I can make it work with the latest glmnet (2.0-13)"
When testing with glmnet
(2.0-10), there is no error
Software Detail¶
Following information are copied from caviarbf R package help directly
Please type ?caviarbfFineMapping
in R for the usage of the interface.
Usage¶
caviarbfFineMapping(lociListFile, maxCausal, nSample, priorType01 = 0,
priorValue = c(0.1, 0.2, 0.4), exact = T, eps = 0, useIdentityMatrix = F,
BFFileListFile = NA, priorProb = 0, fittingMethod = "glmnetLASSOMin",
hyperParamSelection = "cv", nFold = 5, K = 10, rThreshold = 0.2,
pvalueThreshold = 0.05, lambda = c(2^seq(from = -15, by = 2, to = 5), 100, 1000, 10000, 1e+05, 1e+06),
alpha = c(0, 0.2, 0.3, 0.5, 0.7, 0.8, 1), annotationIndices = -1,
annotationSuffix = ".annotations", LDSuffix = ".LD", outputPrefix, keepBF = F,
overwriteExistingResults = F, maxIter = 50, deltaLik = 0.01,
useParallel = F, ncores = 1, verbose = F)
Argument¶
-
lociListFile
A file listing all the z score files. -
maxCausal
The assumed maximal number of causal variants in each locus -
nSample
The number of individuals used to calculate summary statistics -
priorType01
0 to specify sigmaa. An experimental 1 is used to specify the proportion of the phenotype variance explained (pve)
If priorType01 = 0
priorValue
If priorType01 is 0, it specifies the value of sigmaa. Otherwise, it is the pve value. For sigmaa, a vector of values can be used and the final Bayes factors are averaged. For GWAS fine mapping, c(0.1, 0.2, 0.4) might be a good option. For eQTL, c(0.1, 0.2, 0.4, 0.8, 1.6) might be a good option to accout for large eQTL effects.
-
exact
Whether to calculate the exact Bayes factors. This is only valid for quantitative traits. For binary traits, only approximated Bayes factors are available -
eps
The small value to add to the correlation matrix. This is useful when the LD matrix is from a reference panel -
useIdentityMatrix
If true, use the identity matrix as the LD matrix. This improves the computing speed when only considering one causal variant for each locus -
BFFileListFile
Specify the files of precomputed Bayes factors corresponding to each locus in the lociListFile. If NA, the Bayes factors will be calculated first. Bayes factors can precomputed for each locus using the C++ program This has the advantage of running in parallel, and can saved for later fine mapping. -
priorProb
The initial probability of each variant being causal. If it is 0, the initial probability will be the number of loci divided by the number of all variants -
hyperParamSelection
Method to select hyperparameter lambda and alpha."cv"
for loci based cross validation to select lambda and alpha,"aic"
,"bic"
,"bic2"
uses AIC, BIC models. For the BIC model,"bic2"
use the number of SNPs as the number of data points,"bic"
uses the number of individuals as the number of data points, but they often show similar results."topK"
produces the top significant and relative independent annotations. No penalization is used in the fitting. As a by-product, it also outputs the pvalues for each individual annotation when considered separately
If hyperParamSelection = "cv"
-
fittingMethod
Specify different penalization models:glmnetLASSOMin
for L1 penalization,glmnetRidgeMin
for L2 penalization,glmnetENETMin
for elastic net penalization. This is only used whenhyperParamSelection
is"cv"
. -
alpha
The alpha values for parameter selection, only used when the fitting method isglmnetENETMin
-
nFold
The fold of cross validation, only used whenhyperParamSelection
is"cv"
. If the number of loci is less than then nFold, the number of loci is used
If hyperParamSelection = "topK"
K
Specify the top K annotations to select, used for"topK"
rThreshold
Specify the upper bound of the correlation among top annotations, used for"topK"
pvalueThreshold
Specify the p value threshold for top annotations, used for"topK"
-
lambda
The lambda values for parameter selection. If lambda is 0, then no penalization is used in the fitting. This can be used for statistical testing, e.g., comparing two nested annotation models. -
annotationIndices
A vector to specify annotations. If it is -1, use all annotations, 0 for only the intercept, otherwise it specifies the corresponding columns of annotations. Default uses all annotations. -
annotationSuffix
The annotation suffix, default ".annotations" -
LDSuffix
The correlation file suffix, default ".LD" -
outputPrefix
The prefix of the output files -
keepBF
Whether to keep produced Bayes factor files -
overwriteExistingResults
Whether to overwrite existing output files -
maxIter
The maximal iteration in the EM method -
deltaLik
The minimum improvement of log likelihood to continue the EM iteration -
useParallel
Whether to use library parallel to use multiple cores This can increase the speed of calculate PIPs in the EM iteration -
ncores
Number of cores to use in parallel whenuseParallel
is T -
verbose
If true, more information are printed
value¶
The results are saved in several files.
<outputPrefix>.marginal
saves the indices of all variants following the order in the list file and the their PIPs sorted in a descring order.
<outputPrefix>.marginalz
includes the extra region index, the variant ID and the z score.
<outputPrefix>.loglik
stores the relative log likelihood.
<outputPrefix>.gamma
stores the estimated annotation effect size and the effect size of standardized annotation.
<outputPrefix>.log
stores the running log.
<outputPrefix>.time
saves the running time.
Note¶
"topK"
First, annotations are ranked by their individual increase in likelihood compared to the model without any annotations. Then the top k annotations are used