This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tutorials:mali_nov_09 [2012/03/29 12:19] – jon | tutorials:mali_nov_09 [2015/09/21 14:29] (current) – [Querying a protein or gene name at Gene3D] hafsa | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== An Introduction to CATH And Gene3D ====== | ||
| + | |||
| + | Welcome to the [[http:// | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== Tutorial Map ===== | ||
| + | |||
| + | * [[#A Brief Introduction]] | ||
| + | * [[#Welcome to Gene3D]] | ||
| + | * [[#Querying a protein or gene name at Gene3D]] | ||
| + | * [[#The Single Protein View]] | ||
| + | * [[#The Protein Collection View]] | ||
| + | * [[#The Genome Coverage Browser]] | ||
| + | * [[#The Sub-querying System]] | ||
| + | * [[#Finding Domains in Sequences]] | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== A Brief Introduction ===== | ||
| + | |||
| + | At the heart of the system is the CATH classification of protein domains, derived from integrated semi-automatic processing and manual-curation of high-resolution 3D structures in the wwPDB. From these structures protein domains are identified and compared to identify homology relationships and other structural similarities. This hierarchy can be browsed and relationships studied through the website. CATH also provides a set of tools for general structural comparison. | ||
| + | |||
| + | The CATH superfamilies are then extended to the major protein sequence repositories through a process of modelling sequence variation within domain superfamilies, | ||
| + | |||
| + | For more details on the construction of these resources, you are recommended to read the latest NAR papers and documentation around the sites. We are also happy to answer any direct questions about the data (cathteam@biochem.ucl.ac.uk, | ||
| + | |||
| + | |||
| + | ===== Welcome to Gene3D ===== | ||
| + | |||
| + | //Fusing structural annotation with genomes and functions.// | ||
| + | In this guide you can learn a few things about the types of data in Gene3D, how you can retrieve sets of interest, and what tools are built into the website. There are several ways of beginning your investigation, | ||
| + | |||
| + | ===== Querying a protein or gene name at Gene3D ===== | ||
| + | |||
| + | Gene3D can be queried with most recognised identifiers | ||
| + | in the taxon filter box (to restrict to VAV1 proteins in human) and click 'get proteins' | ||
| + | |||
| + | |||
| + | Looking through the list you will find two distinct records for the search; this is because Gene3D merges resources at the sequence level, so slightly differing sequences for the same protein are treated distinctly. However, by clicking the 'Get more functional annotation button' | ||
| + | |||
| + | |||
| + | |||
| + | ===== The Single Protein View ===== | ||
| + | |||
| + | |||
| + | Clicking on the 'Get protein' | ||
| + | |||
| + | Th first tab has a summary page of annotations for the protein. | ||
| + | The second | ||
| + | |||
| + | <box Info|''' | ||
| + | |||
| + | Clicking on domain images will reveal extra functional information and link-outs for a domain. | ||
| + | |||
| + | </ | ||
| + | |||
| + | By looking around the various tabs the funfam assignments you should be able to find annotations from GO and KEGG on the role of VAV1 in the cell and it's molecular function. | ||
| + | We can also inspect the functions of its interactors to help establish the roles of this protein in the cell. | ||
| + | |||
| + | ===== The Protein Collection View ===== | ||
| + | In the sequence features tab clicking for VAV1 click on the link 'Click here for Proteins with similar CATH arrangements' | ||
| + | [[http:// | ||
| + | This displays a protein collection page of multiple proteins, further annotation can be obtained from the drop down menu. | ||
| + | |||
| + | |||
| + | |||
| + | ===== The Superfamily summary ===== | ||
| + | We can find a summary of a superfamily | ||
| + | For example searching for 2.40.128.20 we can see information on functions, domain partners, genome distributions etc | ||
| + | [[http:// | ||
| + | If we click on the Domain organisation tab we can see different domain combinations and the organisms they are found in. | ||
| + | |||
| + | For example clicking on the " | ||
| + | |||
| + | |||
| + | ===== The Genome summary ===== | ||
| + | We can find a summary of a genome by searching from the "Get genome summary" | ||
| + | For example searching for taxon id 4932 we can see information on superfamilies, | ||
| + | [[http:// | ||
| + | its possible to retrieve individual protein sets. | ||
| + | |||
| + | |||
| + | ===== The Genome comparison page ===== | ||
| + | We can compare 2 genomes by searching from the " | ||
| + | For example lets compare the human pathogen plasmodium vivax and the more lethal species plasmodium falciparum. | ||
| + | [[http:// | ||
| + | we can click on individual tabs to see superfamilies, | ||
| + | For example on the funfams tab we can see that the "Rifin -like domain" | ||
| + | The corresponding proteins can be retrieved for either genome on any of the tabs. | ||
| + | |||
| + | |||
| + | |||
| + | ===== Finding Domains in Sequences ===== | ||
| + | |||
| + | Gene3D also provides [[http:// | ||
| + | |||
| + | An example sequence is provided by clicking on the ' | ||
| + | |||
| + | Enter this sequence in the search box and hit the green 'Scan Sequence' | ||
| + | |||
| + | < | ||
| + | MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQLLNNLLPHAINLREVNLRPQMSQFLCLKNIRTFLSTCCEKFGLKRSELFEAFDLFDVQDFGKVIYTLSALSWTPIAQNRGIMPFPTEEESVGDEDIYSGLSDQIDDTVEEDEDLYDCVENEEAEGDEIYEDLMRSEPVSMPPKMTEYDKRCCCLREIQQTEEKYTDTLGSIQQHFLKPLQRFLKPQDIEIIFINIEDLLRVHTHFLKEMKEALGTPGAANLYQVFIKYKERFLVYGRYCSQVESASKHLDRVAAAREDVQMKLEECSQRANNGRFTLRDLLMVPMQRVLKYHLLLQELVKHTQEAMEKENLRLALDAMRDLAQCVNEVKRDNETLRQITNFQLSIENLDQSLAHYGRPKIDGELKITSVERRSKMDRYAFLLDKALLICKRRGDSYDLKDFVNLHSFQVRDDSSGDRDNKKWSHMFLLIEDQGAQGYELFFKTRELKKKWMEQFEMAISNIYPENATANGHDFQMFSFEETTSCKACQMLLRGTFYQGYRCHRCRASAHKECLGRVPPCGRHGQDFPGTMKKDKLHRRAQDKKRNELGLPKMEVFQEYYGLPPPPGAIGPFLRLNPGDIVELTKAEAEQNWWEGRNTSTNEIGWFPCNRVKPYVHGPPQDLSVHLWYAGPMERAGAESILANRSDGTFLVRQRVKDAAEFAISIKYNVEVKHIKIMTAEGLYRITEKKAFRGLTELVEFYQQNSLKDCFKSLDTTLQFPFKEPEKRTISRPAVGSTKYFGTAKARYDFCARDRSELSLKEGDIIKILNKKGQQGWWRGEIYGRVGWFPANYVEEDYSEYC | ||
| + | </ | ||
| + | |||
| + | The main track is the top one, displaying the resolved MDA (the coloured blobs) and all the matches from the various HMM profiles (dotted brackets). Matches from the same superfamily are the same colour, and you can find the E-value by mousing over. Hopefully this image demonstrates two things: (1) The complexity involved in precisely defining domain boundaries (2) The robustness of DomainFinder3 - the in-house algorithm for match selection (paper under review). | ||
| + | |||
| + | Feel free to try your own sequence | ||