InThinking Revision Sites

INTHINKING REVISION SITES

Own your learning

Why not also try our independent learning self-study & revision websites for students?

We currenly offer the following DP Sites: Biology, Chemistry, English A Lang & Lit, Maths A&A, Maths A&I, Physics, Spanish B

"The site is great for revising the basic understandings of each topic quickly. Especially since you are able to test yourself at the end of each page and easily see where yo need to improve."

"It is life saving... I am passing IB because of this site!"

Basic (limited access) subscriptions are FREE. Check them out at:

Cryptic messages - bioinformatics introduction

Before looking at database searches using BLAST or FASTA and the six different ORFs it is important for students to have an understanding of the terminology used in the databases of protein structures and genetic code.  This activity uses a 'geeky' cryptic message to introduce students to amino acids, their three letter codes single letter codes as well as protein structures and DNA sequences.

 

Lesson Description

Guiding Question

Can you crack a cryptic code?

There are several codes in bioinformatics; three letter codes, single letter codes, the DNA code, and mRNA code,

Frances Crick described the transformation of DNA codes into proteins the "Central Dogma" of Biology

Activity 1 - Proteins, single letter codes and the genetic code

In December 2017 a Science organisation tweeted this image as a king of cryptic message. What does this image say to you? 

It is a diagram of two amino acid chains, two short polypeptides.

You can identify the amino acids.

They have an amine, a carbon and then a carboxylic acid    (NH-C-C=0).  There is a different R-group of the central carbon atom in each case.

From the diagram simply counting the red "=O" each time it appears will also give the number of amino acids and thus the number of letters in the two words of the cryptic code.

  1. How many amino acids can you see in each chain?

..........................................................   and  ....................................................

The top polypeptide is made from these five amino acids.

There are four ways which Biologsits use to represent each amino acid:

  • The full name of the molecule
  • The chemical structural formula
  • A three letter code
  • A single letter code.


 

The top polypeptide (amino acid chain) contains the amino acids met-glu-arg-arg-tyr

In databases containing long polypeptide chains the three letter codes are too long so a single letter code is used. 
As there are just 20 amino acids this is quite possible.  In fact only the letters J,O and U are not used, although B and Z are only used in cases where the amino acid is 'ambiguous' and could be one of two possible amino acids.

Try to crack this biochemical code by replacing the amino acids from the first polypeptide chain with their corresponding single letter code. Use the table below to look up the single letter codes of these amino acids.

What does the word spell?

...........................................................................................

 

The second word in the cryptic message diagram is more difficult to decode, because this time you have not been given the names of the amino acids, you have to look up the chemical structures to find out these names.  This Guide to the Twenty Common amino acids is an excellent reference diagram.

Name the amino acids in the second polypeptide using their three letter codes and the SLC.

Three letter codes  ........................................................................................................................................

SLC ...............................................................................................................................................................

 A table showing the amino acid names and single letter codes (SLC)

Amino AcidSLCDNA codons
Alanine       AGCT, GCC, GCA, GCG 
Cysteine CTGT, TGC
Aspartic acid  DGAT, GAC
Glutamic acid   EGAA, GAG
Phenylalanine   FTTT, TTC
Glycine   GGGT, GGC, GGA, GGG 
Histidine HCAT, CAC
Isoleucine   IATT, ATC, ATA
Lysine        KAAA, AAG
Leucine   LCTT, CTC, CTA, CTG, TTA, TTG
MethionineMATG  (Start codon in many species)
Asparagine   NAAT, AAC
Proline       PCCT, CCC, CCA, CCG
Glutamine   QCAA, CAG
Arginine   RCGT, CGC, CGA, CGG, AGA, AGG
Serine        STCT, TCC, TCA, TCG, AGT, AGC
Threonine   TACT, ACC, ACA, ACG
Valine VGTT, GTC, GTA, GTG
Tryptophan   WTGG
Tyrosine   YTAT, TAC
Stop codons Stop TAA, TAG, TGA 

The three letter amino acid codes look like this: ...... 'met-glu-arg-arg-tyr’ + ‘cys-his-arg-ile-ser-thr-met-ala-ser’ 

Convert these amino acids into single letter codes: ___ ___ ___ ___ ___  +  ___ ___ ___ ___ ___ ___ ___ ___ ___

Activity 2 - Coding your own message using amino acid single letter codes

It is really easy to assemble a di-peptide or a polypeptide using the single letter codes.

Use the website PepDraw to build a polypeptide with your own sequence of amino acids. 
If you are lucky you may be able to write your name. 

  1. First invent your own sequence of amino acids.  If you are lucky you could use your name, or the name of your school.

    ..................................................................................................................................................................
  2. take a screenshot of the molecule, or right click and save your polypeptide structure.



    This is what the polypeptide DAVIDFAURE looks like !


 

Activity 3 - Coding your cryptic polypeptide message into DNA code

As any sequence of amino acids in a protein must have been built on a ribosome using mRNA and this has been coded for in the DNA it is possible to represent a polypeptide as a DNA sequence.

"Merry Christmas" would become ATGGAGAGGAGGTACTGCCACAGGATCAGCACCATGGCCAGC 

Of course this is just one possible DNA sequence which would be translated into the same polypeptide message. 
Because the DNA code is degenerate it is possible to have other DNA sequences giving the same polypeptide sequence.

The presence of introns or mutations in real genes can also add to the variety.  We will ignore these variations for now.

  1. First invent your own sequence of amino acids.  If you are lucky you could try your name, or the name of your school.

    ..................................................................................................................................................................

Use the EMBL online tool BACTRANSEQ to work out the DNA code for your polypeptide in Single letter code. 

  1. Type the Single letter protein code of your message into the webform and 'reverse transcribe' your polypeptide message into DNA.


    ....................................................................................................................................................................
  2. What would the DNA sequence for, "DAVIDFAURE" be?    

    .........................................................................................

DAVIDFAURE  could be GACGCCGTGATCGACTTCGCCAGGGAG  in a DNA sequence

Teacher's notes

This activity is intended to be a simple introduction to the databases and tools which Biologists use frequently in the field of Bioinformatics.  The concept that DNA codes for Protein is important, and the different coding systems of amino acids are introduced in this light hearted way.

Later some of the Bioinformatic tools like BLASTn and BLASTp will be covered and this understanding of protein codes will help to make sense of these.

This activity is also a good revision of protein structure, at least primary protein structure.

You can buy Amino acid models - for making a necklace of your name in single letter codes of AAs here