VCF / Samples¶
Variants are normalized upon import. We only import variants, filters and genotypes (we don’t use INFO as we do our own annotations)
The VCF format can vary a lot, we have tested VCFs from the following variant callers:
Each sample is assigned a “variants type” of Unknown, Germline, Mixed (single sample) or Somatic only (tumor minus normal).
This is determined by looking at the “source” entry in the VCF header, and matching it to an entry in VCFSource object (setup by your administrator)
Samples with variants type of_somatic only_ are checked for mutational signatures
Multi-sample VCF files combined using bam files record the genotype for all samples at each variant position.
This allows you to differentiate between reference calls and no coverage - and is extremely important for Trios so that you can make correct calls about inheritance and denovo variants
You must use bam files, to re-call the genotypes for each position.
Consider 3 VCF files:
|HET||(not present)||(not present)|
There’s no way to tell if a variant not being present in a single sample VCF is due to having the reference allele or no coverage.
Merging just the VCFs (without supplying the bams) will give the genotypes of:
Thus, if both parents had reference bases, the calls would be:
|0/1 (HET)||0/0 (HOM_REF)||0/0 (HOM_REF)|
And you can be confident that it is a denovo variant, rather than just lacking coverage in one of the parent samples.