Somalier creates an ancestry report which can be viewed on the ancestry tab of the VCF page example report). This feature is currently “experimental”
Samples are displayed on a PCA plot with individuals from the 1000 genomes project, which have labelled ancestries.
Somalier makes an ancestry prediction by comparing a sample with clusters from data with labelled ancestries.
The reported ancestry on the samples grid is the primary one and does not include admixture. A full breakdown of scores for all population groups can be found on the ancestry tab on the view sample page.
The amount of sites used will depend on a Sample’s capture regions and sequencing depth (default min of 7). At least 1000 informative sites are required for robust calculation of the relatedness coefficient.
You can view how many sites Somalier used from a sample by going to the ancestry tab on the view sample page (under “Extract”)
If a large number of unexpected samples are displayed as related, confirm the sample data type is an accepted input and that the sample has passed QC.
Comparisons work across genome builds and tissue types and can be used to compare RNA-seq, WES, bisulfite and WGS data.
Multi-sample vcf: Ideal input
Single sample vcf: Missing variants are assumed homozygous for reference allele.
Tumour-normal vcf: Not recommended as no common sites due to germline subtraction