DNA is the blueprint of life. The sum total of DNA within a single cell of an organism is
the genome, and genomics is the study of the entire genome. It is a field which
can have a transformative effect on our lives, in health and beyond. And
progress in the field happens to be closely interlinked with developments in computing
technologies. Enormous volumes of data being produced by sequencing, mapping, and analysing
genomes fall within the domain of big data.
Recently, on the sidelines of the AWS (Amazon Web Services)
Public Sector Summit, OpenGov had a conversation with Dr. Swaine Chen, Assistant
Professor at the National University of Singapore (NUS) and Senior Research
Scientist at the Genome Institute of Singapore (GIS), about genomics and data
analytics. GIS is an institute of The Agency for Science, Technology and
Research (A*STAR), Singapore's lead public sector agency that spearheads
economic oriented research to advance scientific discovery and develop
innovative technology.
From 3,000,000,000 to
5,000
In his keynote address at the summit, entitled “The Future
of Analytics to Enable the Future of Genomics”, Dr. Chen used a famous example to
demonstrate the importance of the data
analytics challenge.
Angelina Jolie underwent a double mastectomy procedure because
she took a test that found sequence
differences in her DNA, specifically in two genes called BRCA1 and BRCA2, that
meant that her risk of getting breast cancer was up to 50%, as compared to a 2%
risk of getting breast cancer by the age of 50 for an average person.
The entire human genome is 3 billion base pairs. Analysing 3
billion base pairs is too difficult and too expensive to do as part of a
regular test. So, in order to have a test like the one Angelina Jolie took, we
need to know where to look for the genes associated with breast cancer.
Narrowing down that 3 billion down to a space of 10 million
took 17 years of work. 10 million was still too big. It took 4 years after that
to winnow that 10 million down to 5000, which was the key region where the
BRCA1 and BRCA2 genes were.
“5,000 is around the kind of size we can do relatively
routinely in the lab, where we are looking specifically at a very small region
for BRCA 1 and BRCA2,” said Dr. Chen.
BRCA1 and BRCA2 were found to be important for breast cancer
largely using pre-genomics techniques. The promise of genomics is to accelerate
this
21-year process for other diseases. To achieve this acceleration, we need advances
in both data acquisition and data processing.
For fields such as
imaging or financial data, Moore’s law (the doubling of computing power every
18 months) has simultaneously lowered the costs and boosted capabilities for
both data acquisition and data analytics.
However, for
genomics, data acquisition is a bit different. Genomes are in the cells of our bodies,
as opposed to sitting on a computer or a digital device somewhere. The data
acquisition problem therefore can be further divided into two parts. The first
is about getting a sample from a person, which is difficult to scale up. The
second issue is getting the sequence data from that sample onto the computer.
This second part has been benefitting from Moore’s law.
But
around 10 years ago, genome data acquisition costs started dropping even faster
than Moore’s law; in other words, sequencing is progressing at a hyper-Moore
rate. However, computing power is still progressing at the rate of Moore’s law.
Therefore, computing power is falling exponentially behind the rate of data
acquisition. Dr. Chen called this the “Hyper-Moore gap”.
Because of the Hyper-Moore rate of progress in acquiring
sequencing data, the amount of data is only going to increase in the
near-future. At the moment, researchers are able to handle the analytics, due
to continued advances in computing and because it’s only recently that researchers
have started amassing massive volumes of data. But if current trendlines
persist into the future, Dr. Chen said the Hyper-Moore gap will create a
serious problem with analytics. Cloud computing is one of the key technologies
that can enable individual institutions like GIS to bridge the Hyper-Moore gap
(GIS is working with AWS), and more and more of the processes at GIS are being
moved over the cloud.
Outbreak analysis,
understanding infectious disease mechanisms and synthetic biology
Dr. Chen talked about three main areas of work at his lab.
One is outbreak analysis. He explained, “We help with a lot
of the infectious disease outbreaks in Singapore, using genomics to track and
manage those outbreaks.”
Today, genomics is the international standard for tracking
and monitoring outbreaks, and Singapore is at the forefront of this trend as
well. For instance, in 2015, there was an outbreak of Group B Streptococcus (GBS) in Singapore, coming from
raw fish that was being sold in some hawker centres. Sequencing of genomic DNA
was used to tell the difference between strains (like different individuals) of
GBS. The genomics analysis gave a single clear result: the GBS that was
infecting the patients was the exact same strain that was found on fish that
was being sold at the same time and in the same place during the outbreak.
Genomics gave the tools to track that outbreak and manage it, and to monitor
and make sure it doesn’t come back again.
We asked Dr. Chen as to how do the researchers know what
analysis to perform, and how genomics can further improve outbreak analysis. He
gave one example of how the additional data from genomics helps scientists understand
how outbreaks happen.
Nearly 30% of all people have GBS, and it doesn’t cause a
problem. Then, suddenly one strain causes an outbreak. When that happens, the outbreak
strain grows faster, because now it can grow somewhere that other GBS couldn’t
grow before. Previous research has developed evolutionary theories which
predict characteristic sequence changes when there is rapid growth and
expansion of one strain. Dr. Chen’s lab, in addition to helping track and
manage the outbreak in the short term, later reuses the genomic information to
test these evolutionary theories to find evidence if they are true. This
research would help to understand why a given outbreak happened, which could
lead to better predictive tools, better strategies for managing future
outbreaks, and also provide at least some closure for those affected.
This kind of thing is one reason for the hunger for data.
“We have a lot of theory regarding what we should see in the DNA sequence, and
we are only just recently getting the data we need to go take a look.”
The second area of work in the lab is trying to understand
mechanisms of infectious disease.
“We can have a lot of correlation data but if we want to
develop treatments or new ways to control or prevent infections, we need to
understand how they happen at a molecular level. That leads to new drug targets
that can then lead to new drugs. So, a big part of my lab tries to understand
molecular mechanisms and identify novel strategies to prevent and treat urinary
tract infections. Those are infections which affect any part of the urinary
tract, usually the bladder. Half of all women get a urinary tract infection at
some point in their life,” Dr. Chen elaborated. He added that underlying both
of these pieces of his lab is a lot of computation.
The third area is synthetic biology, which involves the
creation of tools for manipulating bacteria. Dr. Chen’s expertise in synthetic
biology arises from his work in understanding the genetics of bacteria causing
urinary tract infections (UTIs).
The standard way to understand why a certain bacterium is
causing a disease is to change specific regions of the DNA and see if that
affects the disease. If that is true, then maybe the genes or proteins encoded
by that region of DNA can be targeted with a drug.
“The ability to make a specific, desired change in an
organism’s DNA is one of the foundational tools for synthetic biology. This
capability is needed to achieve the big synthetic biology goal of fully designing a bacterium to do specific things,”
said Dr. Chen.
Dr. Chen’s lab has developed some of those tools as part of
their work in understanding mechanisms of how Escherichia coli cause UTIs, but
the same tools are widely applicable to other non-UTI causing bacteria.
All these three areas of research (outbreak analysis, UTIs, and
synthetic biology) flow into antibiotic resistance.
Antibiotic resistance
presents a serious threat to public health globally. It occurs when bacteria
undergo changes following exposure to an antibiotic and the drug becomes
ineffective against that bacteria. This could compromise the ability to treat
common infections and infections arising from complications of medical
procedures, such as surgery and chemotherapy. In fact, the Singapore Government
recently launched a National
Strategic Action Plan on Antimicrobial Resistance (AMR).
UTIs are probably the second leading cause of antibiotic
prescriptions. So, it’s a huge contributor in terms of antibiotic usage and
therefore to antibiotic resistance. So Dr. Chen’s work in finding better ways
to treat UTI could help reduce antibiotic usage which would hopefully help us
reduce antibiotic resistance rates. In addition, his work on synthetic biology can
be applied to understand how different bacteria have different ways of being
resistant. “You may have heard of resistance getting transferred from one bacterium
to another; this is largely due to antibiotic resistance genes. However, sometimes
two different bacteria have the same resistance genes but they seem to be
different in their resistance for unknown reasons. Figuring out why there is a
difference in resistance requires the synthetic biology tools we are building,
and this could lead to exploiting these differences to reduce antibiotic
resistance in other bacteria,” explained Dr. Chen.
Pervasive genomics
Today genomics is highly resource-intensive. But Dr. Chen
said that it will eventually become pervasive and costs will be lowered,
supported by the Hyper-Moore rate of progress in the field, developments in
computing, and a balance between basic science and translational research.
As Singapore proceeds towards its Smart Nation vision,
genomics will become ubiquitous in fields ranging from health to crime
investigation, food security and safety and the health of the environment. Genomics will help with precision medicine, enabling
diagnosis of the right ailment and picking the right treatment. As genomics is
scaled up to the entire population, it will enable predictive healthcare. And Dr.
Chen believes that GIS has a leading role to play in that journey.