How do you get that pretty picture of a protein?
These days, people are learning about protein structure more than they ever wanted to, but life gives us no choice: we need to know about the covid-19 virus, its spikes, and the human protein it uses to get inside the cell, the ACE2 receptor. We also need to know how antibodies, also proteins, defend us from the virus.
There is some very good news about protein structure. Why does this good news matter? Because protein structure determines function, and good function means good health (and lots more).
You must have seen those multi-color pictures that we use to represent proteins. I was involved in a minute part of the work it takes to get one of those pictures and I can tell you it takes years of hard work by many people (plus good luck and cat whiskers!) to obtain one. So, what’s the good news? It’s this: scientists have acquired (the hard way) enough understanding of how proteins fold and function that now an artificial intelligence system in the UK has learned how to predict, with high accuracy, the structure of a protein when given just the primary structure. Believe me, this is HUGE!
Let’s start from the beginning
A protein’s function is made possible by the shape it folds into space. If mutation changes an amino acid in the right (or wrong place), the 3-D shape may change in such a way that the protein can’t do its work any longer. When the protein that was changed by a gene mutation is very important for life, life is no longer possible or severe illness occurs. Those mutations can happen at any time, when the sperm is being produced (genetic conditions) or when we are already fully grown (cancer).
Figure: ribbon model of the EGF receptor (by Boghog – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=45709050)
What does it take to find out the shape of a protein? Why is it important?
Just for humans, we have more than 10,000 proteins, but 3D structures are known for only a fraction of them. It is relatively easy to find out the sequence of amino acids in a protein, what we call primary structure. In fact, nowadays you can simply predict the primary structure of a protein if you know the gene that codes for it. What’s hard, though, is to determine the 3D structure, that is, how the protein folds in space to reach its final shape. Traditionally, the shapes are discovered through meticulous lab work that can take years and involves purification of a protein, characterization, crystallization, etc. Why isn’t this straightforward? After all, there is only one 3D structure for DNA. Chains of amino acids that can twist and bend into (in theory) a huge number of shapes. But within the cell, the protein folds in the one shape that will make it functional. If you change the shape of an enzyme, it will be unable to do its job.
For a biochemist, the monumental task is to know the 3D structure of all proteins that “matter” for the particular problem he/she is studying, be it an illness or the synthesis of starch.
Knowing the protein 3D structure will facilitate the understanding of the mechanisms that drive some diseases, allowing scientists to design new medicines, more nutritious crops, and synthetic enzymes that can break down plastic pollution. The good news is that an artificial intelligence group in the UK called DeepMind has managed to teach computers how to predict a 3D structure based on the amino acid sequence. The learning material? A public database containing about 170,000 protein sequences and their shapes, obtained by hard working biochemists working for many decades.
Does this mean that scientists can now stop crystallizing proteins and analyzing the crystals to deduce structures? Or that we can now stop trying to locate the active sites of an enzyme or even purify the and characterize proteins? Not at all. But my guess is that DeepMind will help find the right conditions to accelerate the research that goes into each of these steps. In fact, the program has already been used to solve structures that have “defeated” biochemists. Moreover, it will be possible to study how proteins combine to form larger “complexes” and how they interact with other molecules in living organisms.
I am glad to know that, artificial intelligence or not, biochemists will be needed for a long long time.
Reference
Ewen Callaway (2020) ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Google’s deep-learning program for determining the 3D shapes of proteins stands to transform biology, say scientists. Nature.NEWS, https://doi.org/10.1038/d41586-020-03348-4