toc | next |
The life sciences study how biology orchestrates molecules to create living organisms. An important central theme here is how the biomolecules impact life. The structure and function of the major biomolecule classes (nucleic acids, proteins, fats, and sugars) play a central role: the are the building blocks.
We found that how these biomolecules result in life is very structured. We found the evolutionary relation between DNA sequences and the similarity in proteins. We discovered that how a protein folds up into a 3D structure is highly conserved. Not only does the same protein sequence strongly drive (there are exceptions) how the protein folds up, but that proteins over many different species show similar protein folds for similar biological roles. The latter is the foundation of, for example, homology modelling which is used in drug discovery to study the function of proteins.
Therefore, being able to compare sequences is an extremely important tool in studying biological systems. This material will look into why and how nucleotide changes in DNA and RNA and amino acid changes in protein structures. Central in this topic is that not every change has the same impact, and you will learn how biologists use that insight when studying evolutionary aspects of DNA and protein sequences.
The double helix of the DNA is transcribed into RNA which translated into protein sequences. Three nucleotides in the DNA and RNA encode for a single amino acid in the protein sequence. We know which codon results in which amino acid. Not every nucleotide change means a change of the protein sequence.
Proteins are biopolymers of amino acids, where individual amino acids are linearly linked via peptide bonds. It is therefore also known as a polypeptide. The linear sequence of amino acids folds up into secondary motives (alpha helices, beta strands, etc) and a tertiary fold. This protein fold is deterministic: if a specific protein primary sequence does not always fold up into the same fold, then the fold’s biologically function would not be preserved. But it is. It is so preserved, that similar primary protein sequence also result in the same fold. It can have small changes, but the fold is quite well preserved. This results in protein families.
Therefore, there is a deterministic relation between the DNA sequence, via the RNA sequence and the protein sequence, with the biological function.
Because of this relation between DNA, RNA, and protein sequence and biological function, comparing these sequences is of high interest too:
Molecular phylogenetics uses differences and similarities to classify organisms into species. Combined with the idea of evolution, we have the foundation of the tree of life. Differences and simlarities between organisms can also be defined by comparing their DNA. Changes in DNA sequences, particularly, the number of changing nucleotides can be used as measure to reflect how far two species diverged. This assumes, of course, that the number of nucleotides change are a certain stable speed.
Possibly the most studies RNA sequence at this moment, is the sequence of the SARS-CoV-2 RNA. Like with species, the similarity of RNA sequences can also be used to classify SARS-CoV-like virusses (image is a screenshot taken of NextStrain):
Using the same approach of stuyding the RNA sequences, it is possible to track how SARS-CoV-2 variants spread over the world (image of NextStrain):
The interactions of proteins with other proteins, with membranes, with small compounds (ligands, substrates) define the rol in the biology of an organism. The backbone of a protein sequence is always identical and only the amino acid side changes are different. The which side chain is found where in the polypeptide is defined by the codons, and therefore defined by evolution. Sometimes changing a single sidechain can be major effects, such as in sickle-cell anemia. Here, a single glutamate is replaced by a valine (see this Proteopedia page):
The online book A sequence alignment and analysis of SARS-CoV-2 spike glycoprotein (2020, Jean-Yves Sgro) provides a nice walk-through for aligning SARS-CoV-2 spike protein sequences.
toc | next |