Cryptic messages - bioinformatics introduction
Before looking at database searches using BLAST or FASTA and the six different ORFs it is important for students to have an understanding of the terminology used in the databases of protein structures and genetic code. This activity uses a 'geeky' cryptic message to introduce students to amino acids, their three letter codes single letter codes as well as protein structures and DNA sequences.
Lesson Description
Guiding Question
Can you crack a cryptic code?
There are several codes in bioinformatics; three letter codes, single letter codes, the DNA code, and mRNA code,
Frances Crick described the transformation of DNA codes into proteins the "Central Dogma" of Biology
Activity 1 - Proteins, single letter codes and the genetic code
In December 2017 a Science organisation tweeted this image as a king of cryptic message. What does this image say to you?It is a diagram of two amino acid chains, two short polypeptides.
You can identify the amino acids.
They have an amine, a carbon and then a carboxylic acid (NH-C-C=0). There is a different R-group of the central carbon atom in each case.
From the diagram simply counting the red "=O" each time it appears will also give the number of amino acids and thus the number of letters in the two words of the cryptic code.
- How many amino acids can you see in each chain?
.......................................................... and ....................................................
The top polypeptide is made from these five amino acids.
There are four ways which Biologsits use to represent each amino acid:
- The full name of the molecule
- The chemical structural formula
- A three letter code
- A single letter code.
The top polypeptide (amino acid chain) contains the amino acids met-glu-arg-arg-tyr
In databases containing long polypeptide chains the three letter codes are too long so a single letter code is used.
As there are just 20 amino acids this is quite possible. In fact only the letters J,O and U are not used, although B and Z are only used in cases where the amino acid is 'ambiguous' and could be one of two possible amino acids.
Try to crack this biochemical code by replacing the amino acids from the first polypeptide chain with their corresponding single letter code. Use the table below to look up the single letter codes of these amino acids.
What does the word spell?
...........................................................................................
The second word in the cryptic message diagram is more difficult to decode, because this time you have not been given the names of the amino acids, you have to look up the chemical structures to find out these names. This Guide to the Twenty Common amino acids is an excellent reference diagram.
Name the amino acids in the second polypeptide using their three letter codes and the SLC.
Three letter codes ........................................................................................................................................
SLC ...............................................................................................................................................................
A table showing the amino acid names and single letter codes (SLC)
Amino Acid | SLC | DNA codons |
Alanine | A | GCT, GCC, GCA, GCG |
Cysteine | C | TGT, TGC |
Aspartic acid | D | GAT, GAC |
Glutamic acid | E | GAA, GAG |
Phenylalanine | F | TTT, TTC |
Glycine | G | GGT, GGC, GGA, GGG |
Histidine | H | CAT, CAC |
Isoleucine | I | ATT, ATC, ATA |
Lysine | K | AAA, AAG |
Leucine | L | CTT, CTC, CTA, CTG, TTA, TTG |
Methionine | M | ATG (Start codon in many species) |
Asparagine | N | AAT, AAC |
Proline | P | CCT, CCC, CCA, CCG |
Glutamine | Q | CAA, CAG |
Arginine | R | CGT, CGC, CGA, CGG, AGA, AGG |
Serine | S | TCT, TCC, TCA, TCG, AGT, AGC |
Threonine | T | ACT, ACC, ACA, ACG |
Valine | V | GTT, GTC, GTA, GTG |
Tryptophan | W | TGG |
Tyrosine | Y | TAT, TAC |
Stop codons | Stop | TAA, TAG, TGA |
The three letter amino acid codes look like this: ...... 'met-glu-arg-arg-tyr’ + ‘cys-his-arg-ile-ser-thr-met-ala-ser’
Convert these amino acids into single letter codes: ___ ___ ___ ___ ___ + ___ ___ ___ ___ ___ ___ ___ ___ ___
Activity 2 - Coding your own message using amino acid single letter codes
It is really easy to assemble a di-peptide or a polypeptide using the single letter codes.
Use the website PepDraw to build a polypeptide with your own sequence of amino acids.
If you are lucky you may be able to write your name.
- First invent your own sequence of amino acids. If you are lucky you could use your name, or the name of your school.
.................................................................................................................................................................. - take a screenshot of the molecule, or right click and save your polypeptide structure.
This is what the polypeptide DAVIDFAURE looks like !
Activity 3 - Coding your cryptic polypeptide message into DNA code
As any sequence of amino acids in a protein must have been built on a ribosome using mRNA and this has been coded for in the DNA it is possible to represent a polypeptide as a DNA sequence.
"Merry Christmas" would become ATGGAGAGGAGGTACTGCCACAGGATCAGCACCATGGCCAGC
Of course this is just one possible DNA sequence which would be translated into the same polypeptide message.
Because the DNA code is degenerate it is possible to have other DNA sequences giving the same polypeptide sequence.
The presence of introns or mutations in real genes can also add to the variety. We will ignore these variations for now.
- First invent your own sequence of amino acids. If you are lucky you could try your name, or the name of your school.
..................................................................................................................................................................
Use the EMBL online tool BACTRANSEQ to work out the DNA code for your polypeptide in Single letter code.
- Type the Single letter protein code of your message into the webform and 'reverse transcribe' your polypeptide message into DNA.
.................................................................................................................................................................... - What would the DNA sequence for, "DAVIDFAURE" be?
.........................................................................................
DAVIDFAURE could be GACGCCGTGATCGACTTCGCCAGGGAG in a DNA sequence
Teacher's notes
This activity is intended to be a simple introduction to the databases and tools which Biologists use frequently in the field of Bioinformatics. The concept that DNA codes for Protein is important, and the different coding systems of amino acids are introduced in this light hearted way.
Later some of the Bioinformatic tools like BLASTn and BLASTp will be covered and this understanding of protein codes will help to make sense of these.
This activity is also a good revision of protein structure, at least primary protein structure.
You can buy Amino acid models - for making a necklace of your name in single letter codes of AAs here