This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tutorials:mali_tutorial [2012/03/21 18:00] – jon | tutorials:mali_tutorial [2015/09/22 14:07] (current) – [A Short Introduction to CATH and Gene3D] hafsa | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Tutorial on CATH and Gene3D ====== | ||
| + | |||
| + | < | ||
| + | < | ||
| + | .Info { | ||
| + | background: #CCCCFF !important; | ||
| + | } | ||
| + | .Question { | ||
| + | background: #FFCCCC !important; | ||
| + | } | ||
| + | .Answer { | ||
| + | background: #CCFFCC !important; | ||
| + | } | ||
| + | |||
| + | INPUT.RevealAnswers { | ||
| + | padding: 10px; | ||
| + | border: 1px solid #663333; | ||
| + | background: #996666; | ||
| + | color: #FFFFFF; | ||
| + | font-weight: | ||
| + | font-family: | ||
| + | } | ||
| + | </ | ||
| + | </ | ||
| + | |||
| + | | ||
| + | |||
| + | In this practical you will be introduced to the CATH/Gene3D websites and servers that will help you in carrying out an investigation into protein structure and function. | ||
| + | |||
| + | You will begin by looking at the structure of a specific protein of unknown function and methods of assigning function to that family by comparing it with data available on CATH and Gene3D. You will then investigate the structural and functional diversity that can exist within CATH superfamilies by exploring a particularly diverse protein family. Then, you will look at two more clinical challenges, one involving drug design and the other how a pathogenic mutation can effect a proteins structure. | ||
| + | |||
| + | <box Info|'' | ||
| + | |||
| + | This tutorial involves working though and referring to a number of external websites. It is highly recommended that you click the link with the right hand mouse button and select either **open link in new window** or **open link in new tab** so that you don't navigate away from this page. | ||
| + | |||
| + | There are Jmol applets embedded in this tutorial which will allow you to explore a number of different structures. Initially. they will display a simple wireframe model. Please click the gray button next to the applet with your left mouse button to display the structure as required for the tutorial. If for any reason, an applet does not display correctly please refresh your browser. | ||
| + | |||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | | ||
| + | |||
| + | |||
| + | CATH is a manually-curated hierarchical classification of protein domain structures. The name CATH derives from the initials of the top four levels of the classification - (C)lass, (A)rchitecture, | ||
| + | * Class refers to the secondary structure content (e.g. mainly-alpha, | ||
| + | * Architecture refers to the general arrangement of the secondary structures irrespective of connectivity between them (e.g. alpha/beta sandwich). | ||
| + | * Topology, also known as the ' | ||
| + | * Homologous Superfamily refers to domains that are believed to be related by a common ancestor. | ||
| + | |||
| + | The levels below this, the S, O, L, I and D-levels, are based on increasing levels of sequence identity . | ||
| + | |||
| + | Each level has a **CATH code** associated with it. Have a look at the following: | ||
| + | |||
| + | {{: | ||
| + | |||
| + | In this example, the CATH code for the domain 1tsrB00 is 2.60.40.720. The **2** refers to the class to which the domain belongs (mainly beta), the **2.60** refers to the architecture, | ||
| + | |||
| + | The domain code itself (for example 1tsrB00) is broken up as follows: the first 4 letters/ | ||
| + | |||
| + | Gene3D extends the CATH superfamilies to sequenced genomes and the major protein sequence repositories (i.e. UniProt) through the generation of a set of statistical models (hidden Markov models or HMMs) for each superfamily, | ||
| + | |||
| + | ====== | ||
| + | |||
| + | What is the number one question people always have about their protein? What it does! What is the function of the protein you are investigating? | ||
| + | |||
| + | We are going to explore the function of the protein 2pma. | ||
| + | |||
| + | One of the ways in which the function of an unknown protein can be inferred is by comparing it with the structures of proteins of known function. You can use the CATHEDRAL server to do this. The CATHEDRAL server uses a structural comparison algorithm to compare a protein of interest (otherwise known as the 'query structure' | ||
| + | |||
| + | The CATHEDRAL server can be found [[http:// | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Please then input 2pmaA into the **PDB/CATH domain code** field and press **Continue**. | ||
| + | |||
| + | Alternatively, | ||
| + | |||
| + | The results are sorted by a score calculated by the weighted average of, for example, normalised RMSD, percentage overlap, sequence identity and SSAP score, with those comparisons with the highest scores at the top of the page. The first result is the chain A of 2pma hitting itself, but the second is a different structure (2rspA00). You will notice that the domain IDs on the CATHEDRAL results page are hyperlinked | ||
| + | |||
| + | The box below shows a 3D structural superposition between the two domains 2pmaA01 and 2rspA00 displayed using the program [[http:// | ||
| + | |||
| + | <jmol : | ||
| + | jmolButton( " | ||
| + | </ | ||
| + | |||
| + | You can move the structures around by moving the mouse when within the box while holding down the left hand button, and you can zoom in and out by moving the mouse forward and back while holding down the middle button (or track ball). If you press the right hand button on the mouse when in the box, a menu will pop up; please feel free to explore the structures further by selecting the various options. You can always reset the superposition to its initial state by refreshing your browser. | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Looking at the superposition, | ||
| + | * Look at the CATH entries for both 2pmaA01 and 2rspA00. Which CATH superfamily do they belong to? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * The superposition suggests that 2pmaA01 and 2rspA00 are very similar in structure. | ||
| + | * They both belong to the superfamily 2.40.70.10. | ||
| + | </ | ||
| + | |||
| + | |||
| + | At the bottom of the page for both entries will be a link to PDBSum ({{: | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Do the CATH entries and/or other external resources tell you anything about the possible function of our unknown protein? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * The resources suggest that the 2pma protein is an aspartic protease, due to its close structural similarity to 2rspA00. | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | | ||
| + | |||
| + | |||
| + | Most superfamilies are structurally and functionally conserved. However, in some of the most highly populated superfamilies (about 4%), there is a great deal of diversity in both structure and function. Such superfamilies allow us to explore protein evolution and, in particular, how structural changes can result in new functions amongst proteins that are evolutionarily related. In this section, you are going to explore the structure-function relations in one of these, the superfamily of the HUP domains. The CATH code of the HUP superfamily is 3.40.50.620. | ||
| + | |||
| + | |||
| + | ===== The HUP Superfamily in CATH ===== | ||
| + | |||
| + | First of all, you are going to look at the HUP superfamily as it is represented in the CATH database. Please click [[http:// | ||
| + | |||
| + | There are two ways in which you can search for a specific superfamily in CATH. At the top right hand side of the home page, there is a search box. You can input 3.40.50.620 into the box and then press the button labeled **Search** by the side of it. Alternatively, | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Type in the superfamily code into the search by **ID/ | ||
| + | |||
| + | Either method of searching will take you to a tabulated page of results. Here you will find more information on the superfamily searched for (**cathnode**), | ||
| + | |||
| + | If you click on the cathnodes tab and then click the hyperlinked cathcode displayed, you will be taken to a page that looks like this: | ||
| + | |||
| + | {{: | ||
| + | |||
| + | At the top left hand side of the screen there is a table which gives you information on what class, architecture and fold the HUP superfamily (which is described as // | ||
| + | |||
| + | |||
| + | <box Question|Question> | ||
| + | * Which well-known fold do the HUP superfamily domains adopt? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * The Rossmann fold | ||
| + | </ | ||
| + | |||
| + | On the top right hand side, there is the image of a domain representative of the family. | ||
| + | |||
| + | Underneath this is a tabulated display holding different information about the HUP superfamily. By default, you will be shown what is contained by the **Non-Redundant Representative** tab. Here you will see a list of domains along with hyperlinks to their individual pages and thumbnails of their structures. This list is of the s35 representatives of the superfamily, | ||
| + | |||
| + | |||
| + | <box Question|Question> | ||
| + | * How many s35 clusters make up the HUP superfamily? | ||
| + | * Does the number of s35 clusters suggest anything about the diversity of this superfamily? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * 69 | ||
| + | * Possibly. The number of s35 clusters suggests that there is a significant degree of sequence diversity within the HUP superfamily. However, this does not guarantee structural diversity, as structure is far more conserved throughout evolution than sequence. | ||
| + | </ | ||
| + | |||
| + | The **Alignments** tab displays groups of domains in the HUP superfamily that have been placed in the same cluster due to being very close in terms of structural similarity. | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== The HUP superfamily in Gene3D ===== | ||
| + | |||
| + | Now, you are going to use Gene3D to explore the HUP superfamily. Please click [[http:// | ||
| + | From the front page go to the "Get superfamily summary" | ||
| + | Which will take you to the a page showing a summary of this superfamily in Gene3D, click [[http:// | ||
| + | |||
| + | |||
| + | At the top of this Superfamily summary page, there are a number of tabs, the first tab shows a brief summary of stats for this superfamily: | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Clicking on each of these items in turn provides different types of information for the superfamily. For example, Clicking on the " | ||
| + | |||
| + | {{: | ||
| + | |||
| + | |||
| + | Clicking on the Funfams tab will bring up a page displaying sub-divisions of the CATH superfamily into its functional families (FunFams). These FunFams provide a means of interpreting the sometimes very large and structurally diverse superfamilies at the functional level. | ||
| + | |||
| + | {{: | ||
| + | |||
| + | |||
| + | |||
| + | <box Question|Question> | ||
| + | * What is the most highly populated FunFam? (Hint Click on the column header to order the column) | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * FF_3.40.50.620_57917 Leucyl-tRNA synthetase -like domain | ||
| + | </ | ||
| + | |||
| + | Other tabs include the OMIM tab which shows OMIM diseases from a SNP that is located in this superfamily. | ||
| + | |||
| + | Lets retrieve the proteins with mutations in the HUP domain associated with the inherited disorder CITRULLINEMIA, | ||
| + | by clicking the "Get Protein" | ||
| + | |||
| + | The result page form this is an individual protein sequence, with lots of tabs for different database annotations. | ||
| + | Clicking on the " | ||
| + | |||
| + | {{: | ||
| + | |||
| + | It may be interesting to consider the function of this protein and associated diseases in the context of its interaction partners. This is accessible from the " | ||
| + | where we can see the protein has multiple protein interaction partners. Clicking the number 16 goes to the protein interaction view | ||
| + | or click [[http:// | ||
| + | Clicking on an edge linking two proteins produces a pop-up with details of the source publication supporting the interaction: | ||
| + | |||
| + | {{: | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | === Structural Comparison of Two HUP Domains === | ||
| + | |||
| + | You are now going to take a closer look at the structural differences that might occur within a diverse CATH superfamily. | ||
| + | |||
| + | In the 3D superposition below, 1r6uB01 is coloured light blue and 1gpmA02 is in pink. Functional residues (namely catalytic residues and ligand binding residues) are highlighted in dark blue and red respectively. | ||
| + | |||
| + | <jmol : | ||
| + | jmolButton( " | ||
| + | </ | ||
| + | |||
| + | <box Question|Question> | ||
| + | * After investigating the superposition thoroughly, are you able to see any significant differences in the two structures and, if so, what? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * There are some structural differences evident in the superposition and the functional residues are in different locations in the two structures. | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ====== Exploring Drug Design ====== | ||
| + | |||
| + | DNA gyrase (1aj6A00) is a bacterial type II DNA topoisomerase; | ||
| + | |||
| + | The 90 kDa heat shock protein (Hsp90) 1a4hA00 belongs to the same superfamily in CATH than 1aj6. Heat shock proteins act as chaperones for a wide range of proteins (referred to as ' | ||
| + | |||
| + | You are going to look into the possibility of geldanamycin being a lead molecule for the development of a new drug to act upon DNA gyrase. | ||
| + | |||
| + | First of all, look at the CATH database records of the domains 1a4hA00 and 1aj6A00. Go to the CATH website [[http:// | ||
| + | |||
| + | <box Question|Question> | ||
| + | * What CATH superfamily do the domains 1a4hA00 and 1aj6A00 belong to? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * They both belong to the superfamily 3.30.565.10 | ||
| + | </ | ||
| + | |||
| + | CATH has an in-house structural comparison algorithm called SSAP. SSAP takes two structures and calculates how similar they are in structure, residue-by-residue. Similarity is measured by the SSAP score. this ranges from 0 to 100; a score of 100 would indicate that the two structures were effectively identical. Please click [[http:// | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Click on the link and the results should appear. If not, keep refreshing the browser every minute or so until they do. Look at the table of results at the top. | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Looking at the SSAP score, are the two domains similar in structure? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * The SSAP score is 77.77 The closer the SSAP score is to 100, the closer in structure two domains are. A score of over 77 is indicative of a significant amount of structural similarity. | ||
| + | </ | ||
| + | |||
| + | Please find below the two structures superimposed via SSAP. If you press the gray button, you will see the superposition displayed as a cartoon. The green compound in the middle is Geldanamycin, | ||
| + | |||
| + | <jmol : | ||
| + | jmolButton( " | ||
| + | </ | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Looking at the position of the ligand binding residues and the position of Geldanamycin, | ||
| + | * What therapetic implications could there be if DNA gyrase could bind easily to Geldanamycin? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * The superposition shows that the ligand binding residues in hsp90 and DNA gyrase are in very similar positions and both can be seen to be in contact with Geldanamycin. It is therefore very likely that DNA gyrase will be able to bind to Geldanamycin. | ||
| + | * Geldanamycin could be used in place of Novobiocin to treat diseases caused by DNA gyrase. | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ====== Sickle Cell Anaemia - How a Single Mutation Can Cause Disease ====== | ||
| + | |||
| + | Sickle cell anaemia is a common inherited genetic disorder. People who suffer from the disease have red blood cells that have an abnormal shape much like that of a sickle. These sickled red blood cells are very fragile and the result is severe anaemia. The disease causes many painful symptoms and can significantly reduce a sufferer' | ||
| + | |||
| + | The structure of normal haemoglobin is shown below. It is a tetramer, which means that its made up of 4 polypeptide chains. In haemoglobin, | ||
| + | |||
| + | <jmol : | ||
| + | jmolButton( " | ||
| + | </ | ||
| + | |||
| + | So, now you have seen what the normal, or native, structure of haemoglobin looks like, you are going to identify the mutation using sequence information gathered from the CATH website. Both the native, (1b86) and mutated forms (2hbs) of the protein have been classified in CATH. You will compare a sequence alignment of the proteins you are interested in. There are a number of online sequence comparison tools; the one you will be using here is ClustalW2. Please click [[http:// | ||
| + | |||
| + | You now need to retrieve the sequence information for the native and mutant forms of haemoglobin in order to perform the sequence alignment. | ||
| + | |||
| + | Go back to the CATH website by clicking [[http:// | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Cut and paste the //Domain ATOM Sequence// into the box on the ClustalW2 page. Then search for the domain 2hbsB00, find the sequence for that and cut and paste it directly underneath the one for 1b86B00. Press the red button titled **Run** underneath the box. Wait until the sequence alignment has been completed. When you get to the results page, scroll down until you see the alignment. Have a look at the alignment. | ||
| + | |||
| + | <box Question|Question> | ||
| + | * What residue change can you see in the mutated protein domain 2hbsB00? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * The residue glutamate 6 has been substituted for a valine | ||
| + | </ | ||
| + | |||
| + | So, now you have discovered the mutation in haemoglobin that causes sickle cell anaemia, the next step is to find out how that mutation causes the disease. Understanding how a mutation affects a protein is an important step in developing treatments to combat a disease. Many mutations causes disease by changing the active site, and therefore a vital function, of the protein they are affecting. Now, do you remember those little white balls in the Jmol above? They represent the mutated residue, valine 6. | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Looking at the haemoglobin structure above, does the mutation glu6-val affect the active site of the protein? | ||
| + | * Whereabouts on the protein is the mutation located? Is it buried within the structure or on the surface? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * No, the mutation is in a different location to the active site of haemoglobin. | ||
| + | * The mutation is on the surface of the protein | ||
| + | </ | ||
| + | |||
| + | So, how does the mutation glu6-val cause sickle cell anaemia? Have a look at the Jmol below. This is the mutated haemoglobin structure 2hbsA00. The mutation, as with the native structure, is highlighted as white balls. | ||
| + | |||
| + | <jmol 2hbs 400 400> | ||
| + | jmolButton( " | ||
| + | </ | ||
| + | |||
| + | <box Question|Question> | ||
| + | * What is the major difference between the native and mutant structures of haemoglobin? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * There are 2 haemoglobin tetramers joined together for the mutant form of the protein | ||
| + | </ | ||
| + | |||
| + | Valine is a hydrophobic amino acid, which means it doesn' | ||
| + | |||
| + | Long fibres of haemoglobin molecules form as the mutated valine-6 residues just keep adding on more haemoglobin molecules as they try to stabilize their structure. | ||
| + | |||
| + | < | ||
| + | <form name=" | ||
| + | <input type=" | ||
| + | <input type=" | ||
| + | </ | ||
| + | |||
| + | <script type=" | ||
| + | /* Hide the answers! */ | ||
| + | $$(' | ||
| + | </ | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||