Clinical Bioinformatics - Coding Variants | Lescai Teaching

Course Materials

Data

Course Materials

Web tools

Case 1

Clinical profile

The case we present first, has a pedigree as follows:

The proband is the individual III-38 and he’s been diagnosed with epilepsy, intellectual disability, speech delay, progressive gait deterioration.

Download the data

Please use the link provided above to download the VCF file, originated from an NGS carried out on the proband. Select the folder tutorial data as indicated in the screenshot below

Click on the folder for case01 and then click on the VCF file. In order to download it, choose raw data on the right hand side as indicated in the screenshot

and then from the menu of your browser use save as, in order to save the file on our computer, in your chosen location.

Now we are ready to begin our analysis.

Annotation of the variants

In order to identify the candidate causative variants, we need to gather sufficient information (predictions, literature, genomic context, protein information). This process is called annotation. To annotate our VCF file, we will use Ensembl VEP.

Our data have been aligned to an hg19 reference: this is something to be always careful about, in order to make sure the annotations match the position of the variants and the intervals of the genomic features. In order to choose the hg19/GRCH37 reference (they differ for the prefix chr of the chromosome name), we will select the appropriate link as shown in the screenshot:

There are many options now to choose from: follow your lecturer in order to discuss the importance of each one and its selection criteria. More detailed information for your reference, can be found here.

Before reviewing the results, it is important to remember the meaning as well as the prioritisation of the predicted consequences in terms of severity

with a detailed explanation at the ENSEMBL page.

Filtering the variants

Once you obtain the annotation results, you can click on the view link as indicated below

review the information, and apply some filters.

In order to view all information in a single page, you should remember to choose all variants from the appropriate setting, and select the important columns you would like to visualise, as shown below:

The order of the filters one applies is a subjective choice: we will suggest here a workflow you can follow in order to narrow down your search.

IMPACT

The category “impact” might be the easiest criterion to use first, since it roughly groups variants based on the predicted consequence they will have on a transcript. The category “HIGH” includes stop and frameshift, start variants and those overlapping the first 2bp of an intron/exon boundary, i.e. predicted to impact donor or acceptor functions in the splicing.

In this particular case, if we order by impact, the first variant appearing at the top of our list is rs1240335250.

We can identify the following information, in the annotations:

This is a splice-donor variant, affecting the gene TDP2
SpliceAI indicates an 89% probability of the variant causing a donor loss in the splice site
Associated phenotypes include “Autosomal recessive cerebellar ataxia-epilepsy-intellectual disability syndrome due to TUD deficiency”, which is consistent with the clinical presentation and the mode of inheritance in the pedigree
if we search the variant in the VCF file we will confirm it’s been sequenced as an homozygous (1/1), which is consistent with the inheritance mode we can observe on the pedigree (autosomal recessive)

In our selection based on the IMPACT category, we also identify another variant causing a frameshift in the OR1B1 gene. This variant is however also homozygous in our VCF but:

it has been submitted in ClinVar and classified as benign
if we follow the ClinVar submission from the link in the annotation we will see that criteria include a high population frequency in 1000 Genomes
a similar frequency is reported in our annotation from gnomAD Exome data (45%)

In order to better search for additional variants with phenotype annotations, or inspect more in detail our results, we should download the data in a TXT format and load it in a spreadsheet, as indicated in the screenshot below:

NB: this is ok for the tutorial and challenge, but it is not recommended when you have hundreds of thousands of annotations.

Additional variant data

In order to confirm our choice, on the VEP results page we can also gather additional information if the variant has been previously described. In order to do that, we should click on the variant identifier link in the column Existing variant, and choose the additional link more about, as indicated in the figure:

This page, when information is available, will display additional data on the phenotype and even predicted 3D models of the protein where the variant might have a consequence.

Expression data

To go even more in depth, we can look into expression data on the GTEx Portal, and verify that the tissue expression of the affected transcript is compatible with our phenotype.

In order to do that, right on the GTEx homepage, we can search for the gene identifier (in our case ENSG00000111802)

Go straight to “exon expression”

And verify that while the canonical transcript is expressed pretty much everywhere, other transcripts affected by this change in splicing might be expressed in tissues compatible with our phenotype.

Case 2

Clinical Profile

This case presents a trio with affected parent and child as follows:

The condition has been diagnosed as:

II-1 (proband): delayed psychomotor development, intellectual disability, microcephaly, delayed speech and language development, cerebellar hypoplasia
I-2: intellectual disability

Data download

As explained for the previous case, please use the folder tutorial02 in order to download the VCF file originated from an NGS carried out on this proband.