This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tutorials:impact_oct_09 [2012/03/29 12:14] – jon | tutorials:impact_oct_09 [2015/09/17 10:48] (current) – [The Genome comparison page] hafsa | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== An Introduction to CATH And Gene3D ====== | ||
| + | |||
| + | Welcome to the [[http:// | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== Tutorial Map ===== | ||
| + | |||
| + | * [[#A Brief Introduction]] | ||
| + | * [[#Finding Domains in a PDB record]] | ||
| + | * [[#Going From Domains to Superfamilies]] | ||
| + | * [[#Finding Domains in a Novel Structure – CATHEDRAL]] | ||
| + | * [[# | ||
| + | * [[#Welcome to Gene3D]] | ||
| + | * [[#Querying a protein or gene name at Gene3D]] | ||
| + | * [[#The Single Protein View]] | ||
| + | * [[#The Protein Collection View]] | ||
| + | * [[#The Genome Coverage Browser]] | ||
| + | * [[#The Sub-querying System]] | ||
| + | * [[#Finding Domains in Sequences]] | ||
| + | |||
| + | |||
| + | |||
| + | ---- | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== A Brief Introduction ===== | ||
| + | |||
| + | At the heart of the system is the CATH classification of protein domains, derived from integrated semi-automatic processing and manual-curation of high-resolution 3D structures in the wwPDB. From these structures protein domains are identified and compared to identify homology relationships and other structural similarities. This hierarchy can be browsed and relationships studied through the website. CATH also provides a set of tools for general structural comparison. | ||
| + | |||
| + | The CATH superfamilies are then extended to the major protein sequence repositories through a process of modelling sequence variation within domain superfamilies, | ||
| + | |||
| + | For more details on the construction of these resources, you are recommended to read the latest NAR papers and documentation around the sites. We are also happy to answer any direct questions about the data (cathteam@biochem.ucl.ac.uk, | ||
| + | ===== Finding Domains in a PDB record ===== | ||
| + | |||
| + | There are many different possible starting points for an investigation. Here we are going to start with a classic investigation, | ||
| + | |||
| + | Let’s say we’re interested in PDB chain 1GCQ, and in particular the domains that can be found in it. You can have a look at the PDB record here: http:// | ||
| + | |||
| + | The first step is to go to the CATH home page (http:// | ||
| + | |||
| + | [[http:// | ||
| + | |||
| + | In a CATH results page all the records that may correspond to the query term are returned. The possible return types are ‘Domain’, | ||
| + | |||
| + | How many chains and domains are associated with this PDB? How many domains per chain? And do all the domains belong to different or the same superfamilies (“T-level”). | ||
| + | |||
| + | |||
| + | ===== Going From Domains to Superfamilies ===== | ||
| + | |||
| + | From the results page for 1gcq, click on the PDB record. Here you can view the structures and sequences for the chains in 1gcq, as well as a simple table of corresponding chains and domains. | ||
| + | |||
| + | Next you’ll look at the pages for the chain 1gcqA, and then finally the domain 1gcqA00. In the summary tabs below for both, you’ll see a ‘History’ tab has been added. This tab details the actions CATH curators have taken with respect to the domain assignments. If you’re looking for an explanation as to why something has been re-defined you can normally find it here. | ||
| + | |||
| + | The domain pages are where the structural definitions meet the CATH hierarchy. As well as an identifier that links to the PDB record, each domain is given a 9-part code specifying its location in the hierarchy. The first four parts, from Class to Homology, are curated while the subsequent levels are based on an automatic sequence clustering protocol. | ||
| + | |||
| + | Use the links from here to find out how many members the superfamily contains and how many superfamilies belong to the fold by using the child node summaries in the relevant pages. And feel free to explore a little at this point. | ||
| + | |||
| + | |||
| + | ===== Domain Recognition in Structures - CATHEDRAL ===== | ||
| + | |||
| + | CATHEDRAL is an algorithm for identifying domains in structures through comparison of input structures with known domains in CATH. It can handle multi-domain as well as single domain structures. Find it by clicking on ‘Tools’ at the top, and then the CATHEDRAL server link. | ||
| + | |||
| + | For this test we are going to take a more recent structure for human vav protein that hasn’t yet been classified: 2vrwB. Normally you would enter this code in the identifier box and ‘Continue’. And again. But CATHEDRAL takes a while to run, so instead you can directly go to the results with this link: | ||
| + | |||
| + | http:// | ||
| + | |||
| + | How many domains are reported and to what superfamilies? | ||
| + | |||
| + | |||
| + | ===== Pair-wise Structural Comparison - SSAP ===== | ||
| + | |||
| + | CATH also provide a server for the pair-wise comparison method SSAP. This returns a superposition of two structures along with a similarity score. For this tutorial, we’ll move on but feel free to try it out if you have time. | ||
| + | |||
| + | Next we’re going to look at the corresponding records for Vav protein and the superfamilies it belongs to in Gene3D. | ||
| + | |||
| + | |||
| + | |||
| + | ===== Welcome to Gene3D ===== | ||
| + | //Fusing structural annotation with genomes and functions.// | ||
| + | In this guide you can learn a few things about the types of data in Gene3D, how you can retrieve sets of interest, and what tools are built into the website. There are several ways of beginning your investigation, | ||
| + | ===== Querying a protein or gene name at Gene3D ===== | ||
| + | |||
| + | Gene3D can be queried with most recognised identifiers | ||
| + | in the taxon filter box (to restrict to VAV1 proteins in human) and click 'get proteins' | ||
| + | |||
| + | |||
| + | Looking through the list you will find two distinct records for the search; this is because Gene3D merges resources at the sequence level, so slightly differing sequences for the same protein are treated distinctly. However, by clicking the 'Get more functional annotation button' | ||
| + | |||
| + | |||
| + | |||
| + | ===== The Single Protein View ===== | ||
| + | |||
| + | |||
| + | Clicking on the 'Get protein' | ||
| + | |||
| + | Th first tab has a summary page of annotations for the protein. | ||
| + | The second | ||
| + | |||
| + | <box Info|''' | ||
| + | |||
| + | Clicking on domain images will reveal extra functional information and link-outs for a domain. | ||
| + | |||
| + | </ | ||
| + | |||
| + | By looking around the various tabs the funfam assignments you should be able to find annotations from GO and KEGG on the role of VAV1 in the cell and it's molecular function. | ||
| + | We can also inspect the functions of its interactors to help establish the roles of this protein in the cell. | ||
| + | |||
| + | ===== The Protein Collection View ===== | ||
| + | In the sequence features tab clicking for VAV1 click on the link 'Click here for Proteins with similar CATH arrangements' | ||
| + | [[http:// | ||
| + | This displays a protein collection page of multiple proteins, further annotation can be obtained from the drop down menu. | ||
| + | |||
| + | |||
| + | |||
| + | ===== The Superfamily summary ===== | ||
| + | We can find a summary of a superfamily | ||
| + | For example searching for 2.40.128.20 we can see information on functions, domain partners, genome distributions etc | ||
| + | [[http:// | ||
| + | If we click on the Domain organisation tab we can see different domain combinations and the organisms they are found in. | ||
| + | |||
| + | For example clicking on the " | ||
| + | |||
| + | |||
| + | ===== The Genome summary ===== | ||
| + | We can find a summary of a genome by searching from the "Get genome summary" | ||
| + | For example searching for taxon id 4932 we can see information on superfamilies, | ||
| + | [[http:// | ||
| + | its possible to retrieve individual protein sets. | ||
| + | |||
| + | |||
| + | ===== The Genome comparison page ===== | ||
| + | We can compare 2 genomes by searching from the " | ||
| + | For example lets compare the human pathogen plasmodium vivax and the more lethal species plasmodium falciparum. | ||
| + | [[http:// | ||
| + | we can click on individual tabs to see superfamilies, | ||
| + | For example on the funfams tab we can see that the "Rifin -like domain" | ||
| + | The corresponding proteins can be retrieved for either genome on any of the tabs. | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||