Using genomic epidemiology in the fight against COVID-19
As the world deals with the current SARS CoV-2 pandemic, genomic epidemiology has become a proven tool in the fight against COVID-19. As the virus passes from person to person, mutations can occur that can be monitored and traced to identify chains of infection. Working as part of the COVID-19 Genomics UK Consortium (COG-UK), Dr Robson and his team work with NHS sites across the South and Lighthouse Labs throughout the UK to help with infection control processes and to provide a UK-wide database of viral genomes. The UK leads this field by a wide margin, and these data are used to track and trace novel variants of concern.
Working closely with PHE, and with reporting to government organisations such as the Scientific Advisory Group for Emergencies (SAGE) and the New and Emerging Respiratory Virus Threats Advisory Group (NERVTAG), this project has developed rapidly to assist in our understanding of the virus as we attempt to bring it under control and plot a course to a normal way of life.
Bios: Dr Samuel Robson is a Senior Research Fellow at the 1024ºË¹¤³§, where he is the Faculty Bioinformatics Lead, and Bioinformatics Lead at the Centre for Enzyme Innovation (CEI). He has developed a Bioinformatics-specific compute cluster here at the University where he has developed analysis pipelines for whole genome sequencing, genome/transcriptome assembly, RNA-seq, ChIP-seq, CLIP-seq, BS-seq, amplicon sequencing, and other typical sequencing data types used by researchers throughout the University.
What the Hell is Bionformatics
so let's go a very good afternoon everyone a very warm welcome to yet another edition of
our research future interdisciplinary webinars i am leila shukron i'm professor of
international law and director of the university of portsmous theme in democratic citizenship
today we are absolutely delighted to welcome our colleague dr sam robson sam is going to address a
very timely topic the sequencing and tracking of phylogeny in proving 19.
let me introduce some some has already a quite impressive academic carrier
he is at the university of postmost he's the bio formatix leads at the center for enzyme
innovation but also he is the same lead in the faculty of um
biology if i'm not mistaken he's written a lot already on a variety of topic as
you can imagine sequencing but not only really he's collaborating on very large number of
projects throughout the university and and we met with some in a completely different environment we
met working on heritage related issues so just to give you a few examples of very
diverse research areas some is working on and that's most very impressive he works on the analysis of microbio
biofilm diversity and the effects of anti-falling technology he works on
understanding the enzymatic activity of wood eating dribbles for biofuel development
well i'm pretending i'm understanding some i don't but i mean anyway well
i will soon i know that's on the analysis of diverse
genre expression pathway in bacterial communities he works as well on identifying novel
biomarkers for prostatic joint infection i'm sure it's together with our colleague gordon
bloon who's professor and director of the health and wellbeing theme he also works on understanding the
pathogenesis and treatment of duchenne muscular dystrophy he works as well on transcriptional
profiling of novel marine organism and as you understood on viruses
including the sars virus and the coronary virus as well apologies for all the words i've
probably mispronounced and all the concepts i didn't know about but some were going to well enlighten us now the
floor is used thank you layla thank you for that introduction and thanks for inviting me along and you've just led very nicely
into my first slide there what the hell is bioinformatics um i get answers quite a lot i've taken
What the Hell is Bioinformatics?
to just telling people i'm a baker now because it's a lot easier than trying to explain my job to people
uh but essentially i've been working in the faculty of science and health uh for the past four years um and my
main role is to work alongside people doing research mainly in biology um but these days the
technology available to biologists is uh generating such huge amounts of
data uh that we need people like myself biometicians who kind of skirt
the uh areas of biology uh computer science and statistics
to help make sense of all the data that's generated um so within the venn diagram i sit
somewhere between somebody who works with computers uh a statistician i'm a chartered
statistician with the royal society of statistics uh and biology
but to be honest if you ask my wife what i do for a living she'll say that i do this stuff here
where i generate pretty pictures for people to put into their publications uh but really it's using computers uh in
order to understand biological processes uh this is a more accurate representation of what my life looks
More accurately...
like uh generating huge amounts of data and trying my best to manage them
i'm going to take a bit of a step back though before i start delving into uh the project in full just because i'm
aware that not everybody i'm speaking to today will have a deep scientific background
um i'm sure everybody's aware of what dna is uh but just to give you a sort of brief
introduction to the systems ongoing inside your cells that allow dna to
actually have a function and generate proteins so ultimately dna is the blueprint
to life it sits with inside all of your cells and it tells us how to make you
What is DNA?
um i recently gave a presentation very much along these lines to the children of my daughter's school
so you'll forgive that if some of the uh some of the graphics about to come up are a little bit cartoony but
i thought it worked quite nicely to explain the concept so uh what's special about dna and one
of the most important things about it is that it's a very simple molecule really it's only made up of four
building blocks uh which called nucleotides and those building blocks
adenine guanine thymine and cytosine which we just call a t c
and g so dna can be thought of as a really really long word but it's only really
made up of four letters so uh those letters are decoded
within ourselves in a process called translation where each set of three what we call
bases um encodes base pairs particular amino acid and amino acids joined together in
chains to make proteins and it's the proteins that actually have an effect within your body so
they're molecules that fold together to do a certain job now that job might be
structural uh they might build cell cell wall cell membranes uh it might be functional uh there's a
protein complex here called the ribosome which is kind of the uh the production machine of your cells
which puts all these things together um but essentially
the way that dna works is it sits in what we see here as a double helix and what's special about this is that
these bases pair together in a specific way so c always goes with g
and a always goes with t and because of this if we know one strand of dna we
automatically know what the other strand is as well and this is what's used to copy dna so when dna copies
uh for instance when um during uh cell meiosis and mitosis
[Music] the dna splits into two and then second copies of the distinct strands are made to make
two lots of double-stranded dna but this process uh is used to create what's called rna which is
a an intermediary molecule which can be taken off to the ribosome to
generate these proteins we use technology such as what i'm going
Next Generation Sequencing - Nanopore
to talk about today called next generation sequencing technologies which allow us to essentially read the sequence of nucleotides that
exist within a strand of dna or rna and the system that we use here is called
nanopore sequencing because it uses these tiny little pores which are just like the pores that i showed there that allow um dna
and rna to sorry rna to be passed outside of the nucleus these sit across
a small membrane that's got a current passing across it and the pores are just big enough to
allow a single strand of dna to pass through um and those bases each have a distinct
charge on them so as they pass through that membrane they cause a distinct change in the
uh the potential that runs across the the membrane and that change in potential can be
converted into a sequence so by doing this literally the dna will
be passed through this and as it goes through we'll simply read off what that sequence is
um and this simple technique has a huge amount of uh things that we can use it for so as
later said there my the research projects i'm involved in are very distinct from looking at rna viruses such as sars cop2
to looking at 500 year old sailors on the mary rose
to looking at biofilm formation in marine environments you name it but
they all use a similar approach where we digitize the sequence into a format that
we can read and that we can process and that we can analyze so the system that i'm talking about today is
nanopore sequencing i've actually got one of these min ions here and it really is very very small it's about the size of a
stapler um and and this machine is able to do all of the things that i'm
going to talk about today we run about 24 samples on one of these at a time um but we also use this bigger system
called the gridiron and essentially this is five of these mini and kind of sellotapes to a big
powerful computer so it just gives us the capacity to run a lot more samples at any one time
Benefits of Long Reads
so one of the big benefits of using nanopore sequencing other sequencing techniques involve
first of all chopping the dna or the rna up into small segments and then doing what's called a short
read sequencing which means that we can only read say 100 to 200 base pairs at a time
and what this means is that if you wanted to do something like here like you wanted to look at e coli and sequence the entire genome of e coli
it's about 4.6 million base pairs so if you were using a 50 base pair read
you'd need 92 000 separate pieces of dna which you'd then have to stitch together
or kind of like doing a jigsaw puzzle by using nanopore sequencing you can
theoretically pass the entire genome through in a single read
realistically it tends to not work quite that way dna is notoriously easy to degrade but
either way you're going to end up with much much longer sequences instead of 50 base pairs you're talking about 500
000 base pairs which means that rather than doing a 92 000 piece puzzle you're just doing a nine piece puzzle
so this is one of the big benefits of using long reads and often this technique is used to kind of
fill in the blanks for very complicated genomes so genomes have a lot of regions within
them that are incredibly complicated to look at because there's lots of deletions and duplications and
repetitive regions and long reads sequencing allows you to stretch across that entire region
so another big benefit of nanopore sequencing is its portability as i've shown the min ion sequencer is very very small
Portability
and can be taken with you to lots of different locations to do field sampling so people have taken it
to locations in antarctica uh taking it to remote locations such as snowdonia national park
and the ecuadorian rainforest and even up in space on the iss so this technology can be used
essentially anywhere provided that you have the equipment that you need along with you which is quite minimal in fairness
Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2)
okay so you've probably all heard of uh covert 19 and the virus that causes
it sars cov2 i think we're probably all reaching a point where we're sick of hearing about it in all honesty um but i am going to talk
about it a lot today i'm afraid so sarskoff ii is a coronavirus uh and
essentially what that means is is it's a it's a hard shell uh with spiky bits
sticking out of it that look a bit like a crown and that's where the name corona virus comes from
essentially it's a solid shell uh with an rna genome inside it it's a very very
simple uh organism all told and the way that it works is that those spikes on the outside that we've that
we've all seen in these spiky ball pictures they're designed specifically they're
proteins that match exactly unfortunately to what are called ace2 receptors
within human cells so like a locking key they can fit within the receptors many
of which are located within our lungs and they can bind onto those receptors and stick to the cells
when this happens there's what's called a cleavage site within the spike protein which splits the pipe spike in two and
allows it to pass its rna genome into our cells at that point our cells essentially will
do exactly the same thing that they're used to doing when they see rna they'll take that rna and they'll convert it into a protein so
essentially this is similar to the virus is sneaking into our cells as i say this this is one of the cartoons i stole i i
used for my uh my daughter's um school so they don't really sneak into our cells but they do
pass in it's almost like within a factory they're sneaking in plans for making other things onto the
foreman's desk so your cells will then start to make lots of versions of this virus so you'll
get lots of copies of the virus being formed and put together by your own cells and in doing this it can make you feel
very poorly and it can make your cells not work properly and this kicks off
your immune response and our immune response obviously very uh um
much more complicated than shown here uh but essentially you've got two main types of uh immune cells
involved in the process uh b cells produce what are called antibodies and antibodies are kind of
y-shaped proteins which are designed specifically to stick onto things that you want to
get rid of and the creation of these antibodies so
that they'll stick on to a specific virus can take a little bit of time
which is why we use vaccines so vaccines can kind of prepare the body
to know what it might be looking out for so because the vaccine contains either
a version of the virus that's no longer infectious or such as the new mrna viruses they
create a small portion of the virus the small essentially the spike protein is made within your cells it's enough to
pre-warn the cells in your body what to expect to see
so when sarskoff ii when covert 19 first kicked off and
STOP COVID-19
the lockdown hit in march last year the first lockdown i got an email to say
that the labs at the university were about to be closed that afternoon i gave a i gave a phone
call to my colleague sharon glacier over at portsmouth hospital trust uh and
asked if she'd mind if i brought some equipment over to her lab so i ran around with gary scarlett in
the biology department trying desperately to grab everything that i could out of there before all the
doors were locked and we were no longer allowed inside and drove it all down to sharon's lab at
the hospital the translational research lab and that is kind of that's what kicked
off this entire project the stock covered 19 project we worked together with the hospital to
uh to prepare ethics statements to i to get funding which is kindly provided by
the university and from the cei to get us started essentially to be able to use our nanopore sequencing
technology in order to be able to to sequence the virus
from a number of different patients being seen at the hospital now the reason for this the rna virus is
a genome of about 30 30 000 bases in length so it's about a hundred thousand times smaller than the
human genome to give you some context on that it's not very big and we're able to
generate a lot of data from this sequencing to be able to tell us exactly what
the complete sequence of that virus looks like now over the time as the virus passes from person to person
and as it it sort of sits in these transmission chains it it slowly mutates so you get very
small changes occur every time uh it copies from one person to another now the mutation rate's very small uh
it's about two mutations a month currently uh although i don't know if there's a more up-to-date estimate for that this is based on
uh the first wave um but essentially it's quite slow as far as viruses go and
by understanding these changes we can identify if a particular version of the iris
came from a direct transmission from somebody else or if it came as a completely novel
introduction to the area so the idea was to work in the hospital to look at cases
of the virus and help to understand where transmission chains were occurring so that we could help to break those
chains and help improve infection control within the hospital so we initially set out to sequence i think
uh 400 samples is what we aim to do over the course of a year um
and it's this can give us a lot of information about how the virus is being passed from person to person
we can link it together with patient information to do epidemiological analysis and try and understand
community spread as well as spread within the hospital um and it can help us to identify new mutations uh
which may be linked to things like increased virulence they might be linked to poorer outcomes
for the patients or they might be linked to different symptoms that we might see over time
whilst all this was going on we ended up joining up with a large national scale effort called the
cog uk consortium the covert 19 genomics uk consortium and that kind of changed the project
significantly in its scope so this is just some pictures of
everybody working in the lab so sharon uh nicely modeling our gridiron there
along with other members of the team and me working very hard there in the bottom right that is how when i'm not on
a meeting with you guys that's how i tend to uh tend to dress so the way that it works
Sample Flow-Through
is people who uh have symptoms of covert 19 um will go in and they'll get tested so
this could be either through what's called pillar one which is uh within healthcare settings so eeg on
wards or in hospitals or it can be community testing so if you're feeling poorly and you get
onto the government website you contact them for a test and this is so called pillar two testing
either way generally speaking there'll be some kind of swab used although there's potential of moving to things
like saliva testing moving forwards from now and these go off to microbiology lab now
this could be the microbiology lab at the hospital uh where where we're currently working
or it could be one of what are called lighthouse labs which are about five different labs centralized and spread
across the whole uk which take large volumes of samples from community testing
and generate these tests some kind of test is used now that might be a pcr
based test it might be a different kind of test such as the point of care testing currently used
in the uh the student asymptomatic screening program and then this information passes on to
our clinical to the clinical care um and helps inform how that patient is is looked after
so we basically take as an offshoot of that the rna that generated and we use that for doing our sequencing
so essentially what we're using here is is an offshoot of this testing process we run
our sequencing analysis we do a wide array of analyses
and then these data are made publicly available as well so almost on a daily basis these data are
being uploaded to a centralized database and being made publicly available so that they can feed into
the uk picture of what the virus looks like and also we feedback directly to
clinicians so i work very very closely with the nhs labs who submit samples with us
to help them to understand what's currently going on within the hospital
so the cog uk consortium the covid19 genomic uk consortium was set up by professor sharon peacock
COVID-19 Genomics UK Consortium
and a number of other pis who joined together at the very start in a very similar uh approach to how we got started to
this in that it was a simple phone call they decided to do it and before you knew it they'd obtained 20 million pounds worth
of funding for setting this up as a sentinel surveillance program for sales cov2 throughout the uk
and essentially the way that it works is it's a it's a distributed approach utilizing academic partners so there's
about 16 different academic institutes including ozdan in the south
university of oxford university of cambridge the welcome trust sanger institute and then other universities throughout
the uk all working very closely with public health agencies and nhs organizations to
essentially try and create a almost real-time map of how the virus
is passing from person to person so we're doing it based on nanopore sequencing using what's known as the
arctic pipeline which i'll discuss a little bit in a moment uh other sites are using different sequencing technologies but essentially
the outcome is the same what we're able to do is generate these uh
sequences uh whole genome sequences from the virus from patients from across the
entire uk and feed it into a um a a map with significant coverage across
the uk to be able to answer specific public health questions but on a national scale so currently the
cog uk i've said here generated over 250 000 high quality genome sequences i think actually today it's probably going to be
more like 300 000 it's both an impressive number but also a very sad number that
there are so many cases that are able to be sequenced um
but the system that's been generated by the cog uk is world leading at the moment and
and really what's being done by the cog uk is probably making up
the vast majority of possibly even around half of all global sequences
generated generated and made publicly available and it's this publicly available
information that's allowing us to fully understand how the virus is spreading how it's changing over time and importantly how
it's developing and the impacts this might have on things like vaccines
so just to give you a a bit of context obviously out of the 300 000 samples we initially set out to do
Throughput
400 samples um we've actually done about eight and a half thousand so far and we're at a
stage where our throughput has increased so significantly uh since the new year that we're probably doing about 700 plus samples a
week at the moment um we have about 10 different nhs trusts submitting samples to us regularly
and then we also get a lot of samples from the lighthouse labs which see tens of thousands of samples every day
and then as i say all of this data is public it is uploaded and then made publicly available through
mainly through the giseid website which is the global initiative on sharing all influenza data which as
it sounds was set up to to share data on influenza but has now been
repurposed to make it the uh largest repository of saskov2 genome data in the world
Geographic Coverage
and then this is just give you an idea of the coverage that we have in our site we receive samples
across the south coast and the south east with new sites coming on board all the
time and and really at the moment the coverage of the cog uk consortium is quite significant
so this is maps of the uk showing um the proportion of positive cases that
have been sequenced throughout the uk and on the right we can see our weekly results and
and more and more areas are becoming darker which is what we want to see we want to try and
uh get more complete coverage over this regions and you can see whales in particular have an incredibly good system in place
for uh for managing and sequencing all cases that come through there
Naming conventions
so i just want to take a brief i'm going to be i'm going to be naming a lot of things and there's a lot of confusion around
the naming of different uh variants or lineages or whatever you want to call them now
the difficulty comes that there's no hard and fast place for when you start calling it a
new lineage or a new variant one thing is that a lot of people use the word strain there
there is definitely only one strain of sars cov2 at the moment uh there are just different versions of
it so lineage and variant tend to be the phrases that are used most often and they're often used interchangeably
i'm going to say lineage from now on and this just gives you an idea of how the naming is used so i use the naming
convention suggested by andrew ramboat and colleagues and andrew is one of the
lead pis on cog uk and uh the way that it works is it's
kind of like a family tree so what we can see here is what's called a phylogenetic map um and here what you're seeing is over
on the left is when things were identical to one another so this is your ancestors so if you were
to trace back your family tree adam and eve would be over here on the left and then
as the family tree diverges over time we can see that it branches off into new
subsets so here we can see that the a subset is quite distinct from the b subset
because they differ from each other by if we come down here about two nucleotides so there's two
bases different uh in common between all of the ones that are in the a lineage uh sorry in
the a lineage and all of the ones that are in the b lineage and then that continues down and down and down and the way that
it works is that we name things so that everything that starts with a b is within this large purple group
but then this b is split up into b two and b one and then b three four five six
and seven and eight nine and then that split down even further into b 1.5 here
and b 1.1 here which is much larger and what's confusing is that as we get
more and more cases this phylogeny develops over time so the naming conventions are trying to
be kept as uh consistent as possible but often you'll find that this b point one b one point one lineage
actually consists of two distinct lineages so down the line this might split off
into two distinct lineages that are named differently it's all very confusing and nothing i
can say will make it any less confusing i'm afraid but that is the convention and that's what what i'll be talking about so when
i say b point 1.1 1.1.7 and b point 1.177
they're both similar to each other down to the b point one location but then they have diverged off so
you may have heard in the news about certain variants of concern that have been highlighted by phe uh and
these are the three main ones at the moment that are uh the one that pa the ones that phe are largely focused on
trying to identify over time more will occur over time more will be identified uh and
over time there'll be more things that we are keeping an eye on to make sure that they're not
uh causing significant problems but the main ones are the b 1.1.7 which is the
so-called kent lineage uh that came about just before christmas there's b point 1.351 which is the
so-called south african lineage and then there's also p 0.1 and you'll notice here that this doesn't start with
a b and this is because once you reach a point beyond having three points within uh the naming
it then resets back to a different letter so this is actually b point one point one point one point
eight i think is is how uh the naming has occurred but then it's being renamed p point one
to avoid the numbers becoming infinitesimal so there's obviously a
little bit of uh issue with naming them around geographical locations
which is why i'm going to try and stick to these ones where i can but just so that you have some context over which ones i'm talking about based
on what's been in the news recently no hang on there we go
so really of more interest at the moment isn't necessarily the variants and the lineages themselves
but specific mutations that might have uh some interesting properties or some
concerning properties so there are certain mutations and the way that they're named
uh such as here d614g these are on the spike protein and where
i said earlier on about proteins being made from uh sequences of amino acids being stitched together
what this is telling you is that the 614th amino acid that makes up the spike
protein is normally an aspartic acid which is called b but in this particular mutation
uh it's actually become iguana dng so uh in there was a an article in the
guardian recently uh about some of the names that people have been giving these and and you can see some of the uh highly
amusing i'm sure names that that scientists have been giving some of these mutations just to add a bit of levity to the process of
analyzing these data but i'll be talking a little bit more detail about the d614g mutation in a
moment because that's quite an interesting one the n501y mutation is one of the
defining characteristics of the b 1.17 lineage and then at the moment e484k
is actually one of the uh most concerning variants because it's been linked with uh antibody escape and there's some
concern over what impact that will have uh on vaccine efficacy
Sequence Read Data
okay so going into the reading i don't want to dwell too much on this but this is just how we go from generating data to
having some idea of what the genome looks like so at the top here this is from left to right this is the entire
genome of sarskoff ii and then each of these blocks is a set of reads
that have been generated and mapped back to the genome what that means is that we found we've got a sequence of letters and we
found the best place along this entire genome where that sequence of letters sits
in that exact order and if you notice they're split up into
two groups so we have an odd group and we have an even group and we use a process of amplification
using what are called primers which are short sections of dna that can kick off that amplification
process from a specific point so we use primers that will specifically
amplify just this short section of the dna sorry of the rna
and we split this up into two different pools we add all the odd numbered ones together into one pool
and all the even numbered ones into another pool because there is a slight overlap between the two and just to make sure that there's no
overamplification at those overlap stages but essentially what we do is we've tiled the entire sars kof2 genome
by about 100 different amplicons short amplicons of about 500 bases
and what we're interested in if we zoom in a little bit closer we can see anywhere that it's gray means
that it matches exactly to what we call the reference sequence so uh we use the very first sarskov2 genome
that was ever generated from a patient in wuhan china we use that as our reference which is kind of
the oldest ancestor that we have access to and we map against that and we find for
the most part they all match every now and then you'll see some colored sections and the colored dots
here and there mean that there's a nucleotide that's different in the read than it is in the reference sequence
but what we care about is not these ones that are kind of randomly distributed but we do care about things like this
where we see a consistent location where there's a difference in our reads
compared to that reference and if we zoom in even closer we can see that what we have
here is we have what should be an a is actually coming out as a g for the
vast majority of our reads so this is a mutation that is seen consistently across our reeds and that
means that it's real and not down to some error in the sequencing and in this particular case this is
affecting uh this aspartic acid d so this is that d614g mutant that i mentioned
D6146 Mutation • Mutation in the spike protein
earlier so uh as i mentioned it's mutating an aspartic acid to a glycine
at position 614 in the protein and then here you can just make out this is the spike protein
as it sits and then this is where that mutation lies and what it does is it slightly opens up the structure of
that spike so it's still able to bind to waste to inhibitors but because of the structural change
it's actually able to bind to s2 inhibitors slightly better and this was one of the very one of the
first mutations that really came to prominence in the analyses that we were doing uh through cog uk
because back in january and february there were zero cases where uh this g existed however from
march onwards it very very quickly took over until actually by june almost a hundred percent of
every single virus we sequenced had the g form of the virus rather than the d
form um so douglas if you if you will um and some work that was
done by phe's showed that uh this g-form
mutant was actually able to uh bind better to ace-2 inhibitors than the
d-form of the mutant which is why this one became a very prominent mutation to focus on to try and
understand whether or not that was having a significant effect on transmissibility of the virus
Variant B.1.1.7
but then come christmas that that was very quickly lost by the appearance of the new
variant b 1.1.7 so this variant arose in the southeast of the uk
in i think end of november was the first case that was seen uh it it actually now accounts for
basically 100 of everything that we see so almost everything that we see at the minute is the new variant
it has very very quickly taken prominence over all other um lineages of the virus and this one's
very interesting so i just realized the video hasn't started playing
one second this was a video generated by professor john mcgeehan
who was able to model where in the spike protein one of the particular
mutations associated n501y is located to show the effect that this
mutation can have on the spike protein but what's interesting is that this one has a large number of mutations uh
compared to its closest its closest ancestor so there's 17 mutations eight of which
are on the spike protein and when you look at this sample i'll show you in a moment of where this sits on the family
tree of the viruses it really does stick out like a sore thumb so whereas most versions of the virus
are uh sequentially uh generated from previous versions this one seems to have generated a huge
number of mutations uh completely independently of anything else so the current working hypothesis is
that this version of the virus is probably uh was probably contracted and then
mutated within somebody who had a very long case of covid uh and possibly somebody who was
undergoing convalescent plasma treatment so that it was able to not only mutate but mutate
specifically to account for um the changes that were being implemented
on it through the treatment so most of the analyses we've recently been involved with
an analysis that's currently recently been submitted to nerve tag
and as a paper looking at whether or not this version of the virus
is associated with increased severity of disease and that that's part of also large-scale
surveillance from phe along with some of those other versions as well
and then we can track it across the across the world as well obviously it's largely known as the uk variant we do see it in the uk but it
has spread very significantly across the world and likely will continue to do so
over time and a lot of work is currently being done on linking in flight travel plans
and how international travel links in with transmissibility of the disease
i realize i skipped through one explanatory slide as well so this one is
From Mutations to Lineages
just to quickly show how we use these mutations so this is just an example what i've done here
all of these at the bottom are possible mutations that might exist in these samples
and then each row is a different sample if it's in red it means that it has that mutation if
it's in gray it means that it has the reference version of that particular snip
and what you can see is you can see groupings of samples together so you can see a group of samples here that share the
this group of snips but actually three of them have an extra snip here that the other
ones don't so it's this type of information that we can piece together to try and understand
well okay these three individuals may be part of a shared transmission chain but it's unlikely that these samples and
this sample came from the same transmission chain because it would have had to have lost
one mutation and gained another
and then so this is analysis now just looking at numbers of cases so this
The National Picture
is just a quick analysis that i ran last night just to update the case numbers that have been
seen and what i've done is within each region of the uh of the country i've
uh over the entire course of the pandemic so far i've taken the number of cases and
normalized to the number of residents within that area and then i've lined everything up so
that it's based on the earliest occurrence of the biggest peak and what you can see here is that for
the most part peak one has paled into insignificance compared to peak two and peak three and peak two was
much higher in cases in places in the north of england compared to the south of england
but in those areas in the north the third peak was lower than what was seen in the second peak so
these areas of the country were places that saw local lockdowns before the complete lockdown in the country was
seen and we can see that following lockdown uh in in december
we can see case numbers have dropped significantly and that's continuing to go in in the right direction as we move
forwards as well okay this is the sort of crux
slide of what i wanted to show today it's quite busy so i'm going to go through it step by step
this is using a program called microreact which is a place that you can go to yourself and explore all the data that
we've generated through the cog uk database through the core uk program and all the data that we generate is
made publicly available and then is incorporated into a variety of tools including microreact
and you can explore those data so the first plot on the left is going to identify different lineages
throughout the country so this is looking across the entire pandemic and it's
got a pie chart within each county uh showing the distribution of lineages within that region
and interestingly when you look at this plot from the first first wave
there was much higher differences between different areas of the country so the north had its own
distinct set of lineages compared to what was going on in the south for instance now it's a lot more standardized across
the entire uk and in particular the b117 mutation has now accounts for so many cases uh
which is what we can see here so in this case the green plot this is essentially a
a pie chart for each week stretched out into a bar chart
and then pieced next to each other so what we can see is all of these colors represent a distinct lineage
and we can see that in the first wave there were certain lineages that were more enriched than others but this didn't
really change too much over that period of time but since october we've seen an increase
in one lineage this increased since summer started going up this was the d614g
mutant uh which was b1.177 was the particular lineage being shown
here and that one was taking over as the most dominant lineage but then come december the b117 lineage
came in and very very quickly started to become the most dominant scene across
the country and you can see now that in the most recent data over 95 percent of every single case
within this read within this period of time was a b117 lineage and you can see that broken down
a little bit more here so you can see the number of genomes and the the time distribution of those
genomes so again we can see that b point one was quite significant in the uh the first wave but actually
seems to be quite low uh these days there are other lineages like the b lineage which is the ancestral lineage which were
uh high in the early days but now is almost never seen whereas we can see these cases that are
new versions of the virus and in particular b 1.177 came about
just after summer and then there are other offshoots of this like 177 177.4
where we seem to have a large number of samples and then the plot on the right is
showing you the family tree that phylogeny but with the specific lineages uh identified in
colors and this green set here is the b117 lineage
and you can see whereas everything else they all kind of sit in amongst themselves there aren't really any what
we call long branches involved here the branch from its closest neighbor for b117
is significant is very very long so the longer this is the more different it is from its
closest neighbor so you can see that the b117 lineage jumps out and is is very very different
from everything else that we see and on the right here i've colored it uh based on whether or not
it has certain uh mutations so you can see the specific mutations
that are uh the d614g mutant is largely present in everything that we see here
uh but then there's the n501y uh and also a deletion of two amino acids
that's uh seen on the b117 lineage and all these data are available for you
to go in and play around with you can uh even start to look at specific
B.1.351 Prevalence
uh variants of interest so if you wanted to see what the prevalence of the new 117 variant looks like you
can limit it just on that and you could even look at a timeline so you can play a little video that will
show you where the first cases were seen and you can see it starting down in the southeast and gradually spreading around
the country and similarly you can do the same with the 351
variant the south african variant you can see here that there's been 218 cases identified all of which attract very
closely by phe um and many of which have been seen in london and you can track over time where
those have been located and you can see here that unlike 117 it doesn't jump out like a sore thumb it
just happens to have one particular mutation of particular concern which is the e484k mutation
The Local Picture
and we can zoom in as well so we can take a look at our local area uh and and try and get an understanding
of what the virus pandemic has looked like within our region so this is uh very roughly looking at uh
some of the areas that have been sequenced by our own lab um and and you can see that it very much
mirrors what's seen across the country within particular the 117 mutant being the uh the most significant case but what's
quite interesting is you can see that actually there's a lot of them which we didn't see in the area until recently
and these all sort of cropped up at around the same time um whereas other cases uh in the first
wave we probably started with uh only a handful of different
local variants which accounted for most cases that we saw
and then this was a a small graphic that's been put together by uh simone gunto who's interested in
approaching the work that we're doing the data-driven work that we're doing approaching it from a a different
perspective of uh creative and artistic vision so what she's done here is she's
taken a very rough geographic location of the cases that we've seen
and used colors and shapes to indicate the different lineages involved
and what she's trying to do with this this is just a very uh early early stage rendition of this what she's
interested in doing is seeing how this looks from a creative perspective
and using this information you can actually get a lot of information so we can kind of see how these variants have changed over
time how they've spread around the region and the more that we do this and the more sequencing we're able to do from
local cases in the community in particular uh the better picture that we'll have of how
that spread has occurred over time uh i just wanted to highlight a few
COG UK Mutation Explorer sars2.cvr.gla.ac.uk/cog-uk
tools as well that you can use to go and investigate these data yourselves so um
there's the cog uk mutation explorer which has recently been generated to allow you to go in
and really start to explore these data and understand which are the most important mutations that you should be
aware of so if you are if you are at all interested in exploring these data i recommend going and having a look at
this because there's a lot of different things that you can learn so here you can see the uh the mutations of most importance
currently which is that 6970 and the 501y mutation that i mentioned earlier
and both of them together are defining of the 117 lineage and you can see obviously a large number
of uk sequences have this particular set but there's also a subset now of b117 which also has
that e484k mutation so this is another variant of concern
that's been identified there haven't been many cases so far but this is the kind of thing that we
are keeping a very close eye out for to make sure that as we sequence these samples
every time a new sample is generated we check to see whether any of them have the characteristics
of a lineage that should be flagged up to phe or chased up by the hospital or some
kind of track and trace should be put into place to try and understand uh who may have been in contact with that individual and that's where those
close relationships with the nhs sites really come into play
um and it lets you explore in a lot more detail so you can look over time at how the samples that have been
generated uh which particular mutations they have so if we look here at that 501
variant we can see that across the time course almost all of them in fact all of them
did have the wild type version right up until just before christmas at which point
we started to see an increase in those with the uh the y form of n501y and until now where
almost 100 of cases now have that particular version so you can go in and explore these in
the visualizer and a lot more information there's also links to antigenic information so it links to antigenic
databases which have looked at which antibodies might be escaped by
certain variants to try and understand which of the variants that we should be focusing on try and identify ones that
we should focus on in case they might have some link to uh potential vaccine dropout
Cluster Identification
and then in terms of feeding back with nhs sites and making a direct impact on
their infection control procedures what we try and do within local cases is we try and identify clusters of cases
so it's a bit of a busy picture but just to give you a brief idea of what you're seeing here these are
several hundred cases and it's the same along the rows as along the columns but
what we do is we work out how similar they are to one another and to do that we look at the mutations
associated with both sample and if they have exactly the same mutations as one another
it gets coloured blue if they have completely different mutations from one another it gets coloured red and anything else
gets the colour in between so what we're interested in are these clusters these groups of samples that
are almost identical to one another and it's those clusters which are most likely to be part of
a a single transmission chain and in these cases we can work with the hospital
to incorporate epidemiological studies to try and understand are they all seen on the same ward as
one another uh were they all part of the same cohort as one another
were they all seen by similar healthcare workers trying to understand why that particular
group of individuals has a single transmission chain associated with them
um and then just finally just to mention about the work that's being done by the university on student testing
COVID-19 Student Testing Program
so uh back last year things started with uh the setting up of the pillar two
testing site within the elden building car park um where if you needed to get a a
community test you could go in there and get tested through the lighthouse lab process um but then last year working in
collaboration with ntl biologica the university set up a asymptomatic screening process so
while the elden building is for people who are feeling poorly and want to understand if they've got covered or not
the uh the asymptomatic screening is a more preventative measure
method to try and identify cases of covert 19 from people who don't realize
that they've got it so this started off with uh doing both pcr and
along with portsmouth hospital university trust and lfd testing sorry which is a lateral flow device
which is a system that picks up whether or not you've got active versions of the protein
the spike protein currently present within you at that time um but that was developed with
the rollout from the government of increased testing for students like at
the end of last year just before christmas and now the spinnaker sports hall has been set up
for an asymptomatic screening program using just lfd testing
and we were originally sending off positives for confirmatory testing from pcr
at the hospital but actually now that's no longer necessary but for those cases where samples were
sent for pcr testing at the hospital we were able to incorporate those into our pipeline for sequencing
and using that information we've been able to do various analyses which we're currently in the process of finalizing
these have been fed back to sage in a recent report to sage last week um along with a
uh uh a report that's been put together through the uk on understanding uh the role of students
in transmission of the virus and kind of one of the main take-homes from this is that
all the positive cases that we've seen are actually part of only a small number uh of specific uh clusters of cases
and most of those were seen in the early days of uh um early days of
university students returning to the area but very very quickly the infection control procedures put
into place by the university reduced cases significantly among
students until students were very much lower transmission risk than those than others and others of the same age group
within the community and then oh just finally as well this is just another uh project that i've been working with
on um with southampton hospital uh southampton general hospital
which is a more direct involvement of the sequencing in infection control so it's based on looking for
cases that are of hospital onset covert 19 infection or so-called hokies what we do is when a hokey is identified
by the hospital it gets sent to us and we generate a report very very quickly within 48 hours of the
sample making its way to us and then feeding that information back to the infection control team at the
hospital so that they can use that information to understand if they're part of larger outbreak clusters or if there's
epidemiological evidence for how they've caught that infection within the hospital and
ultimately what we'll end up doing is we'll be comparing uh this intervention of providing this report back within two days
uh reporting back but doing it much slower and then not reporting back
at all to try and understand whether or not this uh this direct feeding back for infection control can have a large impact
on um cases at the hospital so just in conclusion that that's a very
Conclusions
broad analysis of what we've been doing over the past uh year with the covert 19 pandemic but really
the cog uk consortium it really is a vital part of the uk's response to the pandemic
and we're just a small cog in that cog and it's really important for this
process to continue so that we can continue to trace the virus as it spreads across the uk and
in particular to be able to to react quickly to new potential mutations of concern
and variants that might crop up in the future so this sentinel surveillance uh is being used
as well when looking for tracking of importation so as we come out to lock down trying to understand if increased
transportation of people both due to international travel but also just to local travel
can have an effect on the sorts of lineages that are brought into the area
and then we're continuing to work closely with our nhs partners with public health agencies
to ensure that this ultimately goes into to improve patient care and and help
limit infections where possible and that sort of genomic epidemiology
is really key and it's certainly one of the uh one of the shining lights of the uk
scientific community of what's been able to be achieved using this process and our ability to do this rapidly like
we've done with hokie has been used throughout the consortium to help identify and control outbreaks
in a lot of different settings be their hospitals or workplaces or care homes and we also work very closely with other
studies uh such as siren genomic reacts and hokey all of which are are working very very closely with pha
to to try and uh really feed this information back and incorporate it into patient care moving
forward and then this was just a a graphic generated by alex kagan to
highlight the work done by the cog uk it's a little bit dramatic but
i thought it was quite a nice case from from that initial meeting in the bottom left uh right to uh saving the world from
covert i guess is what's going on in the top there but there's still a way to go yet but i think there's there's a lot of very very
positive uh uh a lot of positive coming out of the vaccine results coming out at the moment
uh numbers are going down in the right direction and hopefully they'll continue to go down
and really the work that we're doing along with other colleagues from across the uk will help ensure that we'll soon get to
the end of this pandemic and just to thank everybody that's involved um there's far too many people
Acknowledgements
uh to go through individually but particularly angie and sharon who i set this up with in the first at
the start and it really was just the three of us kind of against the world to start with and before you knew it
that that number of people increased dramatically and we're at the stage that we're at now
and just thank everybody from all the different nhs sites and from cog uk and uh thank you all for listening thank
you and thanks very much to you sam for this quite amazing work very impressive in
terms of network as well you see because all the themes that you've listed at the end of your presentation
are extremely uh telling off the very very impressive network one more time
that you've developed across the country and certainly internationally i did not interrupt you because i saw
that everybody was following everybody was interested we received a lot of questions so i kept
you know let you talk as much as you wanted and i think as well you've done a great job in making
something extremely complicated i'm sure it is relatively simple so that we all understand so you have questions
um a lot of questions actually and so i'd like us to take at least 10 minutes to go all through to
go through all these questions so the first question is from mona
just wondering how do you get around processing potential positive covered samples
in the open air because she thought that sorry the question just moved how do you
do that in the open air and correct me if i'm wrong i thought it
should be a class 3 organism that will have to be processed in class 1 biosafety cabinet
so uh yeah so first first of all because we're working so closely with the hospital uh we don't ourselves process covid's
um positive samples so everything that we receive from our
submitting sites is rna only so we only receive it after the rna has been extracted
um so by that point it's entirely non-infectious that there's no risk of contamination
uh from from those samples um also they have um class three
facilities at the hospital for dealing with such cases and then similarly it was actually
dropped down to class two working so you can deal with covered positive samples
uh even primary samples primary swap samples at class two with certain
restrictions put in place so um that was largely done i think just to
because so many places needed to be able to process these samples it needed to be a a safe system needed
to be put in place that could be done at class 2 by many many more places that don't have class 3
facilities but with no risk of infection so actually most places have been working under class 2
for the majority of the pandemic thank you very much for that some a
question from gary and i think it's a question we all have in mind there is some discussion about the general evolutionary trend of pathogen
easy to become more contagious but less lethal less symptoms as well
clearly you show mutation causing increased and again the questions moved causing
increased issue uh contagion is there any evidence of
reduced lethality or is it um in fact the reverse happening
so this sort of you know nexus between contagion and and lethality yeah so i mean i i'm
gonna pre-face this with saying i'm not a virologist i have learned a lot about virology in
the past year um but certainly my my feeling would be that the the as far as evolution goes
the optimum uh evolutionary state of a of a virus would be so that it didn't
cause any problems at all to the host so coughing is very useful because it helps spread the virus to other people
um but killing the host is not ideal um in a lot of ways sorry that came off
sounding far more callous than i meant it to but um it's as far as the virus is concerned uh
it doesn't help it to spread which is ultimately all a virus is it it just wants to pass its genetic material from
one person to another um so my my feeling would be with yours
gary that actually if anything we should see that mutations in the virus should result in it becoming less lethal
not more lethal um there was some work released
recently uh that showed for the new 117 mutation uh that there was some potential
increase in um uh lethality and uh but this could be
largely a result of its increased transmissibility uh and also to do with kind of the sorts
of people that it's likely to affect so most of this was done in the community rather than in hospital something
so the work that we've been doing with the hokey trial on uh looking specifically at those in hospital
and in particular those with you know that are suffering the worst from uh covered 19 um should add to that
body of evidence and that paper should be uh released in the not too distant future i think
um but i think it sort of depends as well i mean you know evolution isn't uh
it's not it doesn't know what it's doing these things are just happening by chance
but in response to some stress that's being put on him we think in this case that stress has been uh treatment with convalescent plasma or
something similar um so really its main
change for the 117 mutation is to just become more transmissible to to sort of help evade those uh
uh the results of using the convalescent plasma um but yeah i think time will tell with
it i mean it's only been a year i think we're going to understand a lot more and hopefully over time
it will if anything gets less severe rather than more severe but you only
have to look at flu to realize that it's an incredibly deadly disease year on year um so it hasn't disappeared
over time and and i don't think this one will either unfortunately thank you so much again
for that so a question from karen who stresses a sort of paradoxical situation i think we all
understood that we are in this country in a paradoxical situation aren't we we are world leading for the sequencing
and yet at the same time it seems that we have many cases and the system is not really facing and
right so question from karen we call this mutation of the virus the british variant
is that because it was the it was first discovered here is that because of good research or is
it because it was the first first generated here so this sort of paradox again
yeah that is a very good question i mean it's entirely possible that actually it first came about in another country
and we just happened to detect it here first in kent because it's spread to kent and our approach is
uh quite rapid at identifying these things that's entirely possible and it's
difficult to say one way or another until um you know i i guess
as more countries increase the rate of genomic surveillance that they're doing which many of them are
um i think that we'll start to see to be able to fill in those blanks from outside of our
country of our nation thank you so much i can't say for sure
i'm afraid well yeah as you said suppose we need more time to to understand better all these things another question we all
have in mind i suppose from cressida will vaccination programs drive the evolution of escape
does the timing of delivery play a role um good questions uh i mean
again as i say i'm i'm it's a little bit outside of my my knowledge space but i think my answer
would probably be yes if anything's going to going to lead to adaptation
mutations it will be a systematic vaccine approach but similarly with
influenza we have the same situation with influenza and it's relatively straightforward
to keep on top of in terms of vaccination with a seasonal vaccine rollout so i mean again time is going to
tell as we go forwards but that's very much kind of what we're what we're modeling and what we're
keeping track of with these data is to try and catch these things as early as possible
um so by understanding the sorts of you know it should be said as well a lot of these mutations occur completely
independently of one another so e484k for instance has cropped up multiple times independently of one
another um and it's those types of mutations that are cropping up
in response to something rather than just due to random randomness that are the ones of most
interest and i i think we've got a very good system in place and it's developing
rapidly over time as well so the coverage that we get is increasing all the time uh the tools available for linking these
data together and and understanding what they mean and how they impact uh things like the vaccine rollout uh i
think all of this is yeah it's a time will tell situation i remain very positive about
the future moving forwards uh but i like to consider myself an optimist so i don't know if that's a good thing or not
well that's that's a very good thing surely sam is going to be the last question if you don't mind from me
if you project yourself in the future in your superman spiderman costume what do you see next how are you going
to use this sequencing the work that you've done maybe to apply it to
other viruses or for cues how do you see that i i think
that the work that's been done by cog uk uh is is has been very much
seen by the uk government as having massive impacts and massive uh positive
impact on the covert 19 pandemic so my feeling is that we will see something
similar become a more general uh pathogen surveillance operation um
and i think that there's a lot of you know despite everything despite how bad everything's been over the last year i
think there have been the occasional positive thing to come out of it and one of those positive
things has been the introduction and uh making so ubiquitous of things like
sequencing and other high throughput approaches uh within clinical science so i think that moving
forward now so many places have been set up to allow for
these kinds of technologies to be used more and more people are coming online for doing this sequencing as we go and it the system that we have in place
for sask 2 will be equally applicable to any other pathogen you might name so
um yeah i i think i think there will be a lasting uh you know even if even if by the end
of this year covert is gone we're entirely out of it it's only a matter of time until the next virus
until the next pathogen hits so i think it would be um jumping on the bandwagon now and you
know making taking advantage of everything now of what's been set up i think is going to be really important to create a
lasting uh legacy of the work that's been done absolutely fantastic and this is going to be the
concluding node the lasting legacy thank you so much sam extraordinary
discussion extraordinary work as well as you know everybody you can see this webinar watch it again on the
research features website i'm sure you'll have an even larger audience some thank you very much everyone for being
such a great uh attentive and interested audience i'll see you next week for yet another seminar on
something completely different economic crime and i'd like to thank the team as usual gloria
her claudia and olga in particular thanks very much again sam and see you soon thank you bye