This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tutorials:workshop [2016/11/28 17:11] – [The HUP Superfamily] nataliedawson | tutorials:workshop [2019/06/19 15:02] (current) – sillitoe | ||
|---|---|---|---|
| Line 4: | Line 4: | ||
| ==== Introduction ===== | ==== Introduction ===== | ||
| - | In this practical you will be introduced to the CATH/Gene3D websites and web servers that will help you carry out an investigation into protein structure and function. | + | In this practical, you will be introduced to the CATH/Gene3D websites and web servers that will help you carry out an investigation into protein structure and function. |
| <box Info|'' | <box Info|'' | ||
| - | This tutorial refers to a number of external websites. It is highly recommended that you click the link with the right hand mouse button and select either **open link in new window** or **open link in new tab** so that you don't navigate away from this page. | + | This tutorial refers to a number of external websites. It is highly recommended that you click the link with the right-hand mouse button and select either **open link in new window** or **open link in new tab** so that you don't navigate away from this page. |
| - | There are JSmol applets embedded in this tutorial which will allow you to explore a number of different structures. Initially, they will display a simple wireframe model. Please click the gray button next to the applet with your left mouse button to display the structure as required for the tutorial. If for any reason an applet does not display correctly, please refresh your browser. | + | There are JSmol applets embedded in this tutorial which will allow you to explore a number of different structures. Initially, they will display a simple wireframe model. Please click the grey button next to the applet with your left mouse button to display the structure as required for the tutorial. If for any reason an applet does not display correctly, please refresh your browser. |
| </ | </ | ||
| Line 16: | Line 16: | ||
| ==== A Short Introduction to CATH and Gene3D ==== | ==== A Short Introduction to CATH and Gene3D ==== | ||
| - | CATH is a manually-curated hierarchical classification of protein domain structures. The name CATH derives from the initials of the top four levels of the classification - (C)lass, (A)rchitecture, | + | CATH is a manually-curated hierarchical classification of protein domain structures. The name CATH derives from the initials of the top four levels of the classification - (**C**)lass, (**A**)rchitecture, |
| - | * Class refers to the secondary structure content (e.g. mainly-alpha, | + | |
| - | * Architecture refers to the general arrangement of the secondary structures irrespective of connectivity between them (e.g. alpha/beta sandwich). | + | |
| - | * Topology, also known as the ' | + | |
| - | * Homologous Superfamily refers to domains that are believed to be related by a common ancestor. | + | |
| Each level has a **CATH code** associated with it. Have a look at the following: | Each level has a **CATH code** associated with it. Have a look at the following: | ||
| Line 28: | Line 28: | ||
| In this example, the CATH code is 3.40.50.620. | In this example, the CATH code is 3.40.50.620. | ||
| - | Domain codes (e.g 1n3lA01) are broken up as follows: the first 4 letters/ | + | Domain codes (e.g 1n3lA01) are broken up as follows: the first 4 letters/ |
| - | Gene3D extends the CATH superfamilies to sequenced genomes and the major protein sequence repositories (i.e. UniProt and Ensembl) through the generation of a set of statistical models (hidden Markov models or HMMs). For each superfamily, | + | **Gene3D** extends the CATH superfamilies to sequenced genomes and the major protein sequence repositories (i.e. UniProt and Ensembl) through the generation of a set of statistical models (hidden Markov models or HMMs). For each superfamily, |
| ==== Identifying the CATH Superfamily for a Query Structure ==== | ==== Identifying the CATH Superfamily for a Query Structure ==== | ||
| - | What is the number one question people always have about their protein? | + | What is the number one question people always have about their protein? |
| - | You are going to look at how the CATH database can help us in identifying the function of a particular protein structure. | + | What it does! What is the function of the protein you are investigating? |
| + | Sometimes, we do not know the answer | ||
| - | The PDB structure, 4i6g, is a X-ray crystallography-solved structure for which the function has yet to been determined. However, this can be inferred by comparing | + | **You are going to look at how the CATH database |
| - | The CATHEDRAL server can be found [[http:// | + | The PDB structure, 4i6g, is an X-ray crystallography-solved structure for which the function has yet to be determined. However, this can be inferred by comparing the protein with other proteins of known function. |
| + | |||
| + | The CATHEDRAL server can be found **[[http:// | ||
| {{ : | {{ : | ||
| - | Please download the PDB file for **4i6g** from [[http:// | + | Please download the PDB file for **4i6g** from **[[http:// |
| + | |||
| + | {{ : | ||
| + | |||
| + | Each chain of the PDB can be submitted for structural scans separately. Submit | ||
| A total of 528 matching structures in CATH v4.1 have been found, with scores ranging from very good (in green) through to very poor (in red). | A total of 528 matching structures in CATH v4.1 have been found, with scores ranging from very good (in green) through to very poor (in red). | ||
| Line 59: | Line 66: | ||
| {{ : | {{ : | ||
| - | Each domain classified in CATH has its own entry on the CATH website. To discover more about each domain in the CATHEDRAL results list (e.g. in terms of structure, sequence and function), clicking on a domain id in the list will take you to the web page for that particular domain. | + | Each domain classified in CATH has its own entry on the CATH website. To discover more about each domain in the CATHEDRAL results list (e.g. in terms of structure, sequence and function), clicking on a domain id in the list will take you to the webpage |
| Looking at the domain pages for the first four domain matches (from the PDB 4mlp) in the CATHEDRAL results list, we can see that they do not have any functional information assigned. However, if we click on the domain 1dnpA01 (for example, [[http:// | Looking at the domain pages for the first four domain matches (from the PDB 4mlp) in the CATHEDRAL results list, we can see that they do not have any functional information assigned. However, if we click on the domain 1dnpA01 (for example, [[http:// | ||
| Line 65: | Line 72: | ||
| If you wish to explore other structural domains within a given S35 cluster, clicking on 'Show related domains' | If you wish to explore other structural domains within a given S35 cluster, clicking on 'Show related domains' | ||
| - | Please also visit the link to PDBsum on the domain page. PDBsum is a resource that stores information about all the protein files deposited in the PDB to learn more about the structural and functional characteristics of these domains. | + | Please also visit the link to PDBsum on the domain page. **PDBsum** is a resource that stores information about all the protein files deposited in the PDB to learn more about the structural and functional characteristics of these domains. |
| ==== The HUP Superfamily ==== | ==== The HUP Superfamily ==== | ||
| Line 71: | Line 78: | ||
| We are now going to look more closely at the CATH superfamily in which 1dnpA01 is classified. This is the HUP domain superfamily (CATH code 3.40.50.620), | We are now going to look more closely at the CATH superfamily in which 1dnpA01 is classified. This is the HUP domain superfamily (CATH code 3.40.50.620), | ||
| - | The CATH webpage for the HUP superfamily can be accessed [[http:// | + | The CATH webpage for the HUP superfamily can be accessed |
| {{: | {{: | ||
| Line 77: | Line 84: | ||
| Section 1 is a menu that you click on to navigate the site. From here you can explore the structural and functional features of the superfamily, | Section 1 is a menu that you click on to navigate the site. From here you can explore the structural and functional features of the superfamily, | ||
| - | A concise summary for the superfamily in the form of some useful statistics can be see in section 9. It gives information on, for example, the number of: domains, structural clusters and functional terms. For the HUP superfamily, | + | A concise summary for the superfamily in the form of some useful statistics can be seen in section 9. It gives information on, for example, the number of domains, structural clusters and functional terms. For the HUP superfamily, |
| - | An indication of just how structurally diverse the HUP family | + | An indication of just how structurally diverse the HUP family is shown in section 6. Here, you can scroll |
| The box below shows a 3D structural superposition between the smallest (2pfsA01) and largest domain (1wkbA01) displayed using the program Jmol. What you see initially is a wireframe representation of the superposition, | The box below shows a 3D structural superposition between the smallest (2pfsA01) and largest domain (1wkbA01) displayed using the program Jmol. What you see initially is a wireframe representation of the superposition, | ||
| Line 98: | Line 105: | ||
| === Investigating the Structural and Functional diversity within the HUP Superfamily using CATH === | === Investigating the Structural and Functional diversity within the HUP Superfamily using CATH === | ||
| - | This brings us into the next part of this tutorial in which we are going to explore the structural and functional diversity of the HUP superfamily using CATH. The structure and function of a protein | + | This brings us to the next part of this tutorial in which we are going to explore the structural and functional diversity of the HUP superfamily using CATH. The structure and function of a protein |
| {{tutorials: | {{tutorials: | ||
| Line 104: | Line 111: | ||
| The HUP superfamily is known to be particularly functionally diverse. Here, we concentrate our efforts on looking at two domains | The HUP superfamily is known to be particularly functionally diverse. Here, we concentrate our efforts on looking at two domains | ||
| - | [[http:// | + | **[[http:// |
| {{tutorials: | {{tutorials: | ||
| Line 111: | Line 118: | ||
| // | // | ||
| - | There is a link at the bottom of the page to an overview | + | There is a link at the bottom of the page to an overview |
| level (i.e. different EC3 numbers), which is suggestive of changes in chemistry throughout this superfamily (click [[http:// | level (i.e. different EC3 numbers), which is suggestive of changes in chemistry throughout this superfamily (click [[http:// | ||
| - | If you then go back to the list of MACiE entries and click on the entries for our example domains (M0299, Pantothenate synthetase, EC 6.3.2.1 for 1od6A00 and M0235, Arginyl-tRNA synthetase, EC 6.1.1.19 for 1f7uA01), you can see the overall reactions for these enzymes. It can be seen that both 1od6A00 and 1f7uA01 are ligases, but they have different substrates and form different products. | + | If you then go back to the list of MACiE entries and click on the entries for our example domains (M0299, Pantothenate synthetase, EC 6.3.2.1 for 1od6A00 and M0235, Arginyl-tRNA synthetase, EC 6.1.1.19 for 1f7uA01), you can see the overall reactions for these enzymes. It can be seen that both 1od6A00 and 1f7uA01 are ligases, but they have different substrates and form different products. |
| - | It is clear from these results that the HUP superfamily is associated with a significant number of different enzyme reaction mechanisms. There are a number of possible reasons for this functional diversity. To explore how these enzymes may have evolved different functions, we can look for structural changes within the family. Here, we compare the structures of our two HUP domain examples using our in-house structural comparison algorithm called SSAP. | + | It is clear from these results that the HUP superfamily is associated with a significant number of different enzyme reaction mechanisms. There are a number of possible reasons for this functional diversity. To explore how these enzymes may have evolved different functions, we can look for structural changes within the family. Here, we compare the structures of our two HUP domain examples using our in-house structural comparison algorithm called |
| Whilst the CATHEDRAL algorithm you used at the beginning of the tutorial is fast and allows you to search all structures in CATH, SSAP is a slower and slightly more accurate method for comparing two protein structures. | Whilst the CATHEDRAL algorithm you used at the beginning of the tutorial is fast and allows you to search all structures in CATH, SSAP is a slower and slightly more accurate method for comparing two protein structures. | ||
| - | SSAP takes two structures and calculates how similar they are in structure, residue-by-residue. Similarity is measured by the SSAP score. This score ranges from 0 to 100; a score of 100 would indicate that the two structures were effectively identical. Please click [[http:// | + | SSAP takes two structures and calculates how similar they are in structure, residue-by-residue. Similarity is measured by the SSAP score. This score ranges from 0 to 100; a score of 100 would indicate that the two structures were effectively identical. Please click **[[http:// |
| From this superposition we can see that the two domains are significantly different in structure. This structural divergence is also clearly highlighted by their SSAP score of 58.77 and an RMSD of 8.15Å. | From this superposition we can see that the two domains are significantly different in structure. This structural divergence is also clearly highlighted by their SSAP score of 58.77 and an RMSD of 8.15Å. | ||
| Line 133: | Line 140: | ||
| The superposition shows that, although there is a structural core common to both structures, 1f7uA01 has some considerable structural embellishments not seen in 1od6A00. There are also noticeable shifts in the positions of the catalytic site residues. | The superposition shows that, although there is a structural core common to both structures, 1f7uA01 has some considerable structural embellishments not seen in 1od6A00. There are also noticeable shifts in the positions of the catalytic site residues. | ||
| - | 2DSEC ([[http:// | + | **2DSEC** ([[http:// |
| The 2DSEC plot for the HUP examples 1f7uA01 and 1od6A00 is shown below: | The 2DSEC plot for the HUP examples 1f7uA01 and 1od6A00 is shown below: | ||
| Line 141: | Line 148: | ||
| The 2DSEC plot confirms the findings of the SSAP superposition; | The 2DSEC plot confirms the findings of the SSAP superposition; | ||
| - | Recruitment of different domain partners can also result in changes in protein function. There is a link to a third party application called Archschema ([[http:// | + | Recruitment of different domain partners can also result in changes in protein function. There is a link to a third party application called |
| {{: | {{: | ||
| Line 148: | Line 155: | ||
| Now that we have an idea of the scale of the number of domain partners associated with the family as a whole, we will now return to comparing our two HUP examples using a different resource. Gene3D assigns CATH domains to genes and annotates them with functional and structural information. We are going to use Gene3D to compare the MDAs of our examples. Multi-chain architectures show all the domains contained within a protein chain. | Now that we have an idea of the scale of the number of domain partners associated with the family as a whole, we will now return to comparing our two HUP examples using a different resource. Gene3D assigns CATH domains to genes and annotates them with functional and structural information. We are going to use Gene3D to compare the MDAs of our examples. Multi-chain architectures show all the domains contained within a protein chain. | ||
| - | + | Next, go to the Gene3D v14 website protein search page [[http:// | |
| - | + | ||
| - | Next go to the Gene3D v14 website protein search page [[http:// | + | |
| Line 181: | Line 186: | ||
| Clicking on the ' | Clicking on the ' | ||
| - | For this tutorial, we are most interested in comparing the reaction mechanisms associated with two relatives having different functions. For example, 1h7oA00, Aminolevulinate dehydratase (EC 4.2.1.24) and 1d3gA00, Dihydroorotate oxidase (EC 1.3.3.1). Have a look for the reaction mechanisms corresponding to these ECs in the Catalytic Machinery Similarities table and draw your own conclusion. For more information on this comparison, click on the link within the table. This takes you to a page that compares the two reaction mechanisms side by side. | + | For this tutorial, we are most interested in comparing the reaction mechanisms associated with two relatives having different functions. For example, |
| So, how are these changes in mechanisms mediated? | So, how are these changes in mechanisms mediated? | ||
| Line 187: | Line 192: | ||
| Firstly, we can explore whether there are any significant structural differences between the domains associated with these functions. | Firstly, we can explore whether there are any significant structural differences between the domains associated with these functions. | ||
| - | Within a CATH superfamily, | + | Within a CATH superfamily, |
| - | If we go back to the homepage for the 3.20.20.70 superfamily, | + | If we go back to the homepage for the 3.20.20.70 superfamily, |
| {{: | {{: | ||
| Line 195: | Line 200: | ||
| Going back to our two domain examples, domain ID 1h7oA00 belongs to the functional family (ID: 119454) containing protein structures associated with EC number 4.2.1.24, and is called **Delta-aminolevulinic acid dehydratase, | Going back to our two domain examples, domain ID 1h7oA00 belongs to the functional family (ID: 119454) containing protein structures associated with EC number 4.2.1.24, and is called **Delta-aminolevulinic acid dehydratase, | ||
| - | You can search for further information on these FunFams by selecting the **Alignments** tab under the **Superfamily links** on a superfamily homepage. Entering the FunFam ID into the filter text box will bring up the FunFam of interest. If you click on each of the the functional families' | + | You can search for further information on these FunFams by selecting the **Alignments** tab under the **Superfamily links** on a superfamily homepage. Entering the FunFam ID into the filter text box will bring up the FunFam of interest. If you click on each of the functional families' |
| Like the superfamily summary pages: GO term, EC term, and species information is provided for each FunFam, as well as statistics including the number of domains in the family and the representative domain ID. | Like the superfamily summary pages: GO term, EC term, and species information is provided for each FunFam, as well as statistics including the number of domains in the family and the representative domain ID. | ||
| Line 221: | Line 226: | ||
| Substrates for the two proteins are shown as spheres and indicate the location of the active site. It can be seen that the common core between the two structures is large and there are very little structural embellishments. | Substrates for the two proteins are shown as spheres and indicate the location of the active site. It can be seen that the common core between the two structures is large and there are very little structural embellishments. | ||
| - | The next thing we can look at is whether or not there are local changes, particularly around the active site, for example residue mutations in the site and changes in catalytic residues. Taking 1h7oA00 and 1d3gA00 as our examples, we can go back to their respective functional family pages and look at the multiple | + | The next thing we can look at is whether or not there are local changes, particularly around the active site, for example, residue mutations in the site and changes in catalytic residues. Taking 1h7oA00 and 1d3gA00 as our examples, we can go back to their respective functional family pages and look at the multiple |
| - | We can also use SSAP to create a superposition of our two proteins and then compare the position of functional residues. Just type 1h7oA00 as protein 1 and 1d3gA00 as protein 2. An interactive | + | We can also use [[http:// |
| - | {{: | + | {{: |
| + | The [[http:// | ||
| - | The [[http:// | + | A jmol of the SSAP superposition has been provided with the catalytic residues of the domains highlighted. Here, 1h7oA00 is in pink, with its catalytic residues red and 1d3gA00 light blue with its catalytic residues blue |
| - | + | ||
| - | Once you have your catalytic residues, highlight them on your rasmol superposition using the following commands - **select n1, n2, n3** etc (where nx denotes a catalytic residue number, for example 17) then **spacefill** and then select a color - for example type **colour purple** if you want the catalytic residues for one of the proteins to be purple. | + | |
| - | + | ||
| - | A jmol of the SSAP superposition has been provided | + | |
| <jsmol 1h7o_2 : | <jsmol 1h7o_2 : | ||
| Line 238: | Line 240: | ||
| </ | </ | ||
| - | It can clearly be seen that the catalytic residues of these two domains are in different 3D locations in the active site. An SSAP alignment of the two domains is below which highlights catalytic residues according to their properties. Aromatic residues are in red, polar residues in green and those with a positive charge are in purple. | + | It can clearly be seen that the catalytic residues of these two domains are in different 3D locations in the active site. A SSAP alignment of the two domains is below which highlights catalytic residues according to their properties. Aromatic residues are in red, polar residues in green and those with a positive charge are in purple. |
| {{: | {{: | ||
| - | In this case, unlike the HUPS, its unlikely that any global structural changes have resulted in the functional diversity observed in this family. Our analysis suggests that changes in chemistry occurring in diverse relatives in this superfamily are more likely to be associated with changes in the 3D location and nature of the catalytic residues in the active site. | + | In this case, unlike the HUPS, it is unlikely that any global structural changes have resulted in the functional diversity observed in this family. Our analysis suggests that changes in chemistry occurring in diverse relatives in this superfamily are more likely to be associated with changes in the 3D location and nature of the catalytic residues in the active site. |
| ==== The HUP Superfamily in GENE3D ==== | ==== The HUP Superfamily in GENE3D ==== | ||
| Line 270: | Line 272: | ||
| === Protein Interactions === | === Protein Interactions === | ||
| - | Scrolling through the page you can see this protein has multiple physical protein interactions. Some of these are with proteins from a known disease causing bacterium, suggesting a possible role for this protein in disease progression. (NB. Instead of scrolling you can use the navigator box on the left to jump to different sections). | + | Scrolling through the page you can see this protein has multiple physical protein interactions. Some of these are with proteins from a known disease-causing bacterium, suggesting a possible role for this protein in disease progression. (NB. Instead of scrolling you can use the navigator box on the left to jump to different sections). |
| ==== Extra work ==== | ==== Extra work ==== | ||
| Line 276: | Line 278: | ||
| If there is time, explore the domains of your own favourite protein using this Gene3D [[http:// | If there is time, explore the domains of your own favourite protein using this Gene3D [[http:// | ||
| + | |||
| + | ~~DISCUSSION: | ||