This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tutorials:structural_functional_analysis [2009/03/17 16:27] – jperkins | tutorials:structural_functional_analysis [2015/09/21 15:05] (current) – [The CATHEDRAL Server] hafsa | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== CATH Tutorials ====== | ||
| + | |||
| + | ===== Combining Structural and Functional Analysis ===== | ||
| + | |||
| + | < | ||
| + | < | ||
| + | .Info { | ||
| + | background: #CCCCFF !important; | ||
| + | } | ||
| + | .Question { | ||
| + | background: #FFCCCC !important; | ||
| + | } | ||
| + | .Answer { | ||
| + | background: #CCFFCC !important; | ||
| + | } | ||
| + | |||
| + | INPUT.RevealAnswers { | ||
| + | padding: 10px; | ||
| + | border: 1px solid #663333; | ||
| + | background: #996666; | ||
| + | color: #FFFFFF; | ||
| + | font-weight: | ||
| + | font-family: | ||
| + | } | ||
| + | </ | ||
| + | </ | ||
| + | |||
| + | ==== Introduction ==== | ||
| + | |||
| + | In this practical you will be introduced to the CATH/Gene3D websites and servers that will help you in carrying out an investigation into protein structure and function. | ||
| + | |||
| + | We will begin by looking at the structure of a specific protein (FtsA) and how it can be split into its component domains. We will then investigate the superfamily of one of these domains further by looking at the types of biological functions and how the superfamily is distributed across different biological kingdoms. From this you should gain an understanding of both how to carry out this type of investigation and also, from the real example, how related proteins can vary substantially in structure and function and multi-domain context. | ||
| + | |||
| + | FtsA is essential for bacterial cell division and is found in the hypothermophilic bacterium // | ||
| + | |||
| + | ==== The CATHEDRAL Server ==== | ||
| + | |||
| + | First of all, you need to retrieve all the domains present in FtsA. You can use the CATHEDRAL server to do this. The CATHEDRAL server employs a structural comparison algorithm to compare the query structure against known domains in the CATH database, which means you can also use it to try and identify a unknown protein by comparing it with all known structures in CATH. You submit the protein for analysis at [[http:// | ||
| + | |||
| + | <box Info|''' | ||
| + | |||
| + | Depending on the size of the structure that you submit and the numbers of other users, the CATHEDRAL server can take upto an hour to return results. The link below jumps straight to the results from a search of 1e4f: | ||
| + | |||
| + | [[http:// | ||
| + | |||
| + | </ | ||
| + | |||
| + | <box Question|Question> | ||
| + | * By examining the CATHEDRAL results, how many unique domains does the server recognise in the protein structure of FtsA? | ||
| + | * Which superfamilies do they belong to? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * 4 | ||
| + | * 3.30.420.40 (2 domains), 3.30.1490.110, | ||
| + | </ | ||
| + | |||
| + | For the remainder of this investigation you are going to focus on one superfamily present in FtaA, the Nucleotidyltransferase domain family (CATH code: 3.30.420.40). | ||
| + | |||
| + | <box Question|Question> | ||
| + | |||
| + | * What are the unique CATH domain identifiers for the 3.30.420.40 domain of 1e4fT? | ||
| + | |||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | |||
| + | * 1e4fT03 and 1e4fT04 | ||
| + | |||
| + | </ | ||
| + | |||
| + | |||
| + | <box Info|A Brief Explanation of PDB Codes & CATH Domain Identifiers> | ||
| + | |||
| + | PDB codes consist of 4 letters and numbers, typically of a form ' | ||
| + | |||
| + | </ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== Investigating Structural Variation ==== | ||
| + | |||
| + | The next step is to investigate one of the domains in the CATH superfamily 3.30.420.40 using both CATH and Gene3D. | ||
| + | |||
| + | <box Info|Brief Explanation of the CATH resource> | ||
| + | |||
| + | CATH is a manually-curated hierarchical classification of protein domain structures. The name CATH derives from the initials of the top four levels of the classification - Class, Architecture, | ||
| + | * Class refers to the secondary structure content (e.g. mainly-alpha, | ||
| + | * Architecture refers to the general arrangement of the secondary structures irrespective of connectivity between them (e.g. alpha/beta sandwich); | ||
| + | * Topology, also known as the ' | ||
| + | * Homologous Superfamily refers to domains that are believed to be related by a common ancestor. | ||
| + | |||
| + | The levels below this, the S-levels, are an automated clustering based on sequence identity. | ||
| + | |||
| + | </ | ||
| + | |||
| + | To find the domain in CATH, enter the domain code (i.e. 1e4f followed by the chain ID and domain ID) into the search box [[http:// | ||
| + | |||
| + | <box Question|Question> | ||
| + | * What is the type of fold members of this superfamily adopt? | ||
| + | * How is the architecture described? | ||
| + | * Would you describe this as a regular architecture? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * Answers: Nucleotidyltransferase domain 5 | ||
| + | * 2-Layer Sandwich | ||
| + | * Yes | ||
| + | </ | ||
| + | |||
| + | This domain is found in a wide variety of apparently very different proteins with differing molecular and cellular functions. Clearly this needs further investigation. Clicking on the domain ids of the structures in this superfamily brings up more information on that particular domain. The Rasmol link ({{: | ||
| + | |||
| + | As you have seen so far, structural data can be a very powerful way of viewing a protein. However, the structural data set, as represented by the PDB, is very sparse; for reasons of cost and ease there is far more sequence data, with associated annotation available. It is possible to predict structural domains in sequences by using ' | ||
| + | |||
| + | <box Info|Brief Explanation of Profile-HMMs> | ||
| + | |||
| + | Hidden Markov Models are similar to a sequence profiles (like those used in PSI-BLAST) that model the amino acid distribution of a domain superfamily. These models are generated by creating alignments of many homologues and then counting the frequency of occurence for each amino acid in each column of the alignment (profile). This are then used to create probabilities of occurrence against a background evolutionary model that accounts for possible substitions. They provide a convenient and powerful way of identifying homology between sequences. | ||
| + | |||
| + | </ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== The Gene3D Server ==== | ||
| + | |||
| + | We are now going to use the Gene3D resource to explore this superfamily further. Gene3D provides access to more functional information, | ||
| + | |||
| + | We're going to start with the 1e4fT04 domain you have been investigating. You will take other examples from the superfamily later and make observations from Gene3D. | ||
| + | |||
| + | Follow this link to [[http:// | ||
| + | |||
| + | A page is return which contains a series of tabs containing structural and functional information related to 1e4fT. Compare the HMM-based predictions (CATH_HMM) to the Pfam domain assignments. Pfam families are normally derived from analysis of sequences rather than structures and so can often contain multiple structural domains that commonly co-occur. | ||
| + | Click on the second 3.30.420.40 domain (the discontiguous one) and follow links out to CATH, Gene3D. You should also see some functional sub-classification of this domain in the pop-up. | ||
| + | |||
| + | |||
| + | |||
| + | <box Question|Question> | ||
| + | * From the tabs does this protein look to be involved in the Cell cycle ? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * Yes there is some GO annotation from Uniprot supporting this association. | ||
| + | </ | ||
| + | |||
| + | |||
| + | When you're ready, we can investigate other proteins from this superfamily. | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== Heat Shock Chaperones ==== | ||
| + | |||
| + | Next try 1dkgD (or follow this shortcut [[http:// | ||
| + | |||
| + | <box Info|Brief Explanation of Discontinuous Domains> | ||
| + | |||
| + | The general concept of a domain is a continuous sequence of amino acids in a chain. However, the rules guiding folding are more complex than that. Whilst not well understood, it has been frequently observed that the sequence coding in a domain may be ' | ||
| + | |||
| + | </ | ||
| + | |||
| + | <box Question|Question> | ||
| + | * What species is the sequences corresponding to 1dkgD found in? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * From the summary tab we can see that this sequence is found in many strains of E.Coli | ||
| + | </ | ||
| + | |||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== Eukaryotic Hexokinases ==== | ||
| + | |||
| + | As an example of this group, search Gene3D with the pdb 1bdg (or follow this shortcut [[http:// | ||
| + | |||
| + | Looking at the domain architecture in the sequence features tab we can see that the protein has two CATH domains, clicking on the CATH domains we see they have different Funfam annotations. | ||
| + | Funfams are subdivisions of CATH superfamilies providing functionally coherent groupings. We can retrieve proteins with similar functions (because they have the same Funfam assignments). | ||
| + | by clicking the "Click here for functionally similar proteins" | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Are there functionally similar proteins in Plasmodium species ? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * Yes, the " | ||
| + | </ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== The Actin family ==== | ||
| + | |||
| + | |||
| + | For this family you are going to look at interactions. For this you are going to start with the Actin-related protein (arp) with the PDB code 1k8kA. Search this term in the Gene3D | ||
| + | (or follow this shortcut [[http:// | ||
| + | |||
| + | <box Question|Question> | ||
| + | * What type of cellular processes is this protein involved in? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * Cytoskeleton related processes | ||
| + | </ | ||
| + | |||
| + | <box Question|Question> | ||
| + | * Are there any drugs for this protein? | ||
| + | </ | ||
| + | |||
| + | <box Answer|Answer> | ||
| + | * Yes, an experimental one in Drugbank | ||
| + | </ | ||
| + | |||
| + | Find the interaction with this protein by clicking on the interactors tab. | ||
| + | |||
| + | You can now start to investigate the sub-processes the interactors of this protein are involved in. | ||
| + | |||
| + | |||
| + | < | ||
| + | <form name=" | ||
| + | <input type=" | ||
| + | <input type=" | ||
| + | </ | ||
| + | |||
| + | <script type=" | ||
| + | /* Hide the answers! */ | ||
| + | $$(' | ||
| + | </ | ||
| + | </ | ||
| + | |||