FINEMAP¶
FINEMAP only works for SINGLE locus finemapping WITHOUT functional annotations.
Running Process¶
- Prepare Z file, LD file and other optional input files as described in Input file format
- Define datasets in the
master
file, each dataset correspond to one locus - Run FINEMAP program, get the results
Notes¶
- There are 3 subprogram which you need to choose from,
--cond
(stepwise conditional search),--config
(Evaluate a single causal configuration without performing shotgun stochastic search),--sss
(Fine-mapping with shotgun stochastic search),--sss
is the most commonly used mode when performing finemapping --corr-config
This is the option to set the posterior probability of a causal configuration to zero if it includes a pair of SNPs with absolute correlation above this threshold. This option is required with a default of 0.95. It is necessary because of the algorithm implementation. If a pair of SNPs is perfectly or near perfectly correlated, an important matrix (Rcc) will become not invertible.
Argument¶
--cond
Fine-mapping with stepwise conditional search- Subprogram
--config
Evaluate a single causal configuration without performing shotgun stochastic search- Subprogram
--corr-config
Option to set the posterior probability of a causal configuration to zero if it includes a pair of SNPs with absolute correlation above this threshold- Default: 0.95
--corr-group
Option to set the threshold for grouping a pair of SNPs with absolute correlation above this threshold- Default: 0.99
--dataset
Option to specify a delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2)- Default: All datasets will be processed
--flip-beta
Option to read a column 'flip' in the Z file with binary indicators specifying if the direction of the estimated SNP effect sizes needs to be flipped- With --cond, --config and --sss
--group-snps
Option to group SNPs on the basis of their correlations- With --cond and --sss
--in-files
Option to specify a semicolon separated master file with the following column names: 'z', 'ld', 'snp', 'config', 'n_samples' and optionally 'k' and 'log'. Each line is a dataset with file extensions corresponding with column names. The column 'n_samples' represents the GWAS sample size- With --cond, --config and --sss
--log
Option to write output to log files specified in column 'log' in the master file- No log files are written by default
--n-causal-snps
Option to set the maximum number of allowed causal SNPs- Default: 5
--n-configs-top
Option to set the number of top causal configurations to be saved- Default: 50000
--n-convergence
Option to set the number of iterations that the added probability mass is required to be below the specified threshold (--prob-tol) before shotgun stochastic search is terminated- Default: 1000
--n-iterations
Option to set the maximum number of iterations before shotgun stochastic search is terminated- Default: 100000
--prior-k
Option to use prior probabilities for the number of causal SNPs from K files as specified in column 'k' in the master file- SNPs are by default assumed to be causal with probability 1/(# of SNPs in the region)
--prior-k0
Option to set the prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself- Default: 0.0
--prior-std
Option to specify a comma-separated list of prior standard deviations of effect sizes- Default: 0.05
--prob-tol
Option to set the tolerance at which the added probability mass (over --n-convergence iterations) is considered small enough to terminate shotgun stochastic search- Default: 0.001
--rsids
Option to specify a comma-separated list of SNP identifiers corresponding with the 'rsid' column in Z files as specified in column 'z' in the master file- Required with
--config
- Required with
--sss
Fine-mapping with shotgun stochastic search- Subprogram
Input file format¶
Input requirement¶
- Master file (required)
- Used to specify all information needed
- Z file (required)
- The
dataset.z
file is a space-delimited text file and contains the GWAS summary statistics one SNP per line
- The
- LD file (required)
- The
dataset.ld
file is a space-delimited text file and contains the SNP correlation matrix (Pearson's correlation)
- The
- K file (optional)
- specify prior probabilities for the number of causal SNPs in the genomic region by using a
dataset.k
file
- specify prior probabilities for the number of causal SNPs in the genomic region by using a
- BGEN, BGI, SAMPLE and INCL file (optional)
Master file¶
The master
file is a semicolon-separated text file and contains no space. It contains the following mandatory column names and one dataset per line.
Input¶
z
column contains the names of Z files (required)ld
column contains the names of LD files (required)n_samples
column contains the GWAS sample sizes (required)k
column contains the optional K files (optional)bgen
column contains the names of BGEN files (optional)bgi
column contains the names of BGI files (optional)sample
column contains the names of SAMPLE files (optional)incl
column contains the names of INCL files (optional)
Note
File extensions must correspond with the column names in the header line!
Output¶
snp
column contains the names of SNP files (required)config
column contains the names of CONFIG files (required)cred
column contains the names of CRED files (required)dose
column contains the names of DOSE files (optional)log
column contains the optional LOG files (optional)
Example¶
- A
master
file with two datasets using precomputed SNP correlations could look as follows.
z;ld;snp;config;cred;log;n_samples
dataset1.z;dataset1.ld;dataset1.snp;dataset1.config;dataset1.cred;dataset1.log;5363
dataset2.z;dataset2.ld;dataset2.snp;dataset2.config;dataset2.cred;dataset2.log;5363
- A
master
file with two datasets using precomputed SNP correlations in the first dataset and BGEN support in the second dataset could look as follows.
z;ld;bgen;bgi;dose;snp;config;cred;log;n_samples
dataset1.z;dataset1.ld;;;;dataset1.snp;dataset1.config;dataset1.cred;dataset1.log;5363
dataset2.z;;dataset2.bgen;dataset2.bgi;dataset2.dose;dataset2.snp;dataset2.config;dataset2.cred;dataset2.log;5363
- A
master
file with one datasets using BGEN support and a subset of 5,000 samples could look as follows.
z;bgen;bgi;dose;sample;incl;snp;config;cred;log;n_samples
dataset2.z;dataset2.bgen;dataset2.bgi;dataset2.dose;dataset.sample;dataset.incl;dataset.snp;dataset.config;dataset.cred;dataset.log;5000
Z file¶
The dataset.z file
is a space-delimited text file and contains the GWAS summary statistics one SNP per line. It contains the mandatory column names in the following order.
rsid
(can be specified arbitrarily) column contains the SNP identifiers. The identifier can be a rsID number or a combination of chromosome name and genomic position (e.g. XXX:yyy)chromosome
(can be specified arbitrarily) column contains the chromosome names. The chromosome names can be chosen freely with precomputed SNP correlations (e.g. 'X', '0X' or 'chrX')position
(can be specified arbitrarily) column contains the base pair positionsallele1
(can be specified arbitrarily) column contains the "first" allele of the SNPs. In SNPTEST this corresponds to 'allele_A', whereas BOLT-LMM uses 'ALLELE1'allele2
(can be specified arbitrarily) column contains the "second" allele of the SNPs. In SNPTEST this corresponds to 'allele_B', whereas BOLT-LMM uses 'ALLELE0'maf
(needed to output posterior effect size estimates on the allelic scale) column contains the minor allele frequenciesbeta
(required) column contains the estimated effect sizes as given by GWAS softwarese
(required) column contains the standard errors of effect sizes as given by GWAS softwareflip
optional column - see below
Example¶
- A
dataset.z
file with three SNPs could look as follows.
rsid chromosome position allele1 allele2 maf beta se
rs1 10 1 T C 0.35 0.0050 0.0208
rs2 10 1 A G 0.04 0.0368 0.0761
rs3 10 1 G A 0.18 0.0228 0.0199