The testing plugin is enabled and should be disabled.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

data_curation:domain_chopping_documentation:index [2023/09/28 16:32]
vwaman
data_curation:domain_chopping_documentation:index [2023/09/29 14:40] (current)
vwaman
Line 1: Line 1:
-[[data_curation:domain_chopping_documentation:index| Data curation : Domain Chopping (DomChop) tutorial]]+==== Data curation : Domain Chopping (DomChop) tutorial (last updated August, 2023)==== 
 +(Tutorial documented by: Dr. Natalie Dawson and Dr. Vaishali Waman; DomChop webpage created by Dr. Ian Sillitoe)
  
- +==== What is domain chopping? ====
- +
-(last updated 2022) +
- +
-==== **Guide What is domain chopping?** ====+
  
 **Domain chopping (DomChopping) is the process of 'chopping' polypeptide chains from the Protein Data Bank (PDB) into one or more protein structure domains.** **Domain chopping (DomChopping) is the process of 'chopping' polypeptide chains from the Protein Data Bank (PDB) into one or more protein structure domains.**
  
-Logging in+**Logging in**
  
 http://update.cathdb.info/cgi-bin/index.pl http://update.cathdb.info/cgi-bin/index.pl
-• Select the 'DomChop' link+• Select the '**DomChop**' link
  
 • When prompted to login, make sure you select the 'Current CATH database (production)' option from the Database list • When prompted to login, make sure you select the 'Current CATH database (production)' option from the Database list
- 
-Once you've logged in, you will get a confirmation screen. Check that the login details table details the correct: 
-• database (cathdb_current) 
- 
-• host (rodan) 
- 
-• username 
  
 Select 'Continue', then select the 'DomChop' link. Select 'Continue', then select the 'DomChop' link.
  
-General information+=== General information ===
  
 Please note that it can take a few seconds for pages to be loaded due to database read/write processes. Please avoid clicking buttons multiple times if a page is still loading otherwise this can cause page errors. Please note that it can take a few seconds for pages to be loaded due to database read/write processes. Please avoid clicking buttons multiple times if a page is still loading otherwise this can cause page errors.
  
-RasMol setup +**Checking the literature**
- +
-To view the putative domains suggested by each algorithm, RasMol needs to interpret the .rasscript file downloaded from the CATH DomChop pages. If you are using Windows, please create a .bat file containing the following: +
- +
-cd "c:\Program Files\RasWin" # <change to directory containing your RasMol program> +
- +
-raswin.exe ­script %1 +
- +
-Please then use this .bat file to open any RasMol files downloaded from the DomChop pages. +
- +
-If you are using Linux, please create a file called rasscript.com, containing the following (assuming that the RasMol application is in your $PATH): +
- +
-#!/bin/sh +
- +
-rasmol ­script "$1" +
- +
-Please then use this rasscript.com file to open any RasMol files downloaded from the DomChop pages. +
-  +
- +
- +
- +
- +
- +
-  +
-RasMol colouring issues for lowercase chain ids +
- +
-Please note that there is a known issue with RasMol where the domain colouring is not accurate for PDB chains whose ids are lowercase. In these cases, please use either the 3D View tool or download the Pymol script to view with Pymol. +
- +
- +
-Checking the literature+
  
 Please keep in mind througout that it can be very helpful to consult the publication associated with the PDB id when deciding on a chopping. This could be, for example, for confirmation or when the algorithm results do not provide a reasonable solution. Please keep in mind througout that it can be very helpful to consult the publication associated with the PDB id when deciding on a chopping. This could be, for example, for confirmation or when the algorithm results do not provide a reasonable solution.
Line 64: Line 25:
  
  
-How to identify domain boundaries, AKA choosing a chopping (quick overview)+**How to identify domain boundaries, AKA choosing a chopping (quick overview)**
  
-On the DomChop home page, select 'Get New Chain'. This will load a new chain for you to process.+On the DomChop home page, select '**Get New Chain**'. This will load a new chain for you to process.
  
 First, load the image of the chain to get an idea of how the structure looks. To view an illustration of the chain, select the RasMol icon in the top right-hand box that contains the chain's image. If the 3D structure is very unpacked and does not have a compact, globular structure, add the comment (under the 'Comments' tab) "Unpacked chain" and move onto the next chain. If the 3D structure consists of a fragment, for example a single helix (e.g. as in 5lv6A), add the comment "Fragment" and move onto the next chain. First, load the image of the chain to get an idea of how the structure looks. To view an illustration of the chain, select the RasMol icon in the top right-hand box that contains the chain's image. If the 3D structure is very unpacked and does not have a compact, globular structure, add the comment (under the 'Comments' tab) "Unpacked chain" and move onto the next chain. If the 3D structure consists of a fragment, for example a single helix (e.g. as in 5lv6A), add the comment "Fragment" and move onto the next chain.
Line 74: Line 35:
 Please note that the values and scores provided below are only guidelines. For example, even if the ChopClose result has a bad SSAP score, it could still be the case that it provides an accurate chopping for your chain. Please always view the 3D structure before making a decision on which result to choose. Please note that the values and scores provided below are only guidelines. For example, even if the ChopClose result has a bad SSAP score, it could still be the case that it provides an accurate chopping for your chain. Please always view the 3D structure before making a decision on which result to choose.
  
-1. ChopClose (CC)+1. **ChopClose (CC)**
 If there is a a CC result available, we would first look at the superposition of our query chain with the matching chopped chain from CATH. Typically we would expect a good superposition if the "NW sequence identity" field is at least 30%, if the SSAP score is >= 70 (preferably >= 80), and if the RMSD is <= 5 Angstroms. If there is a a CC result available, we would first look at the superposition of our query chain with the matching chopped chain from CATH. Typically we would expect a good superposition if the "NW sequence identity" field is at least 30%, if the SSAP score is >= 70 (preferably >= 80), and if the RMSD is <= 5 Angstroms.
  
Line 84: Line 45:
 The CC superpositions comprise the new query chain aligned with the best-matching chain that has already been chopped in CATH. The darker colours represent the new query and the lighter colours represent the best-match in CATH. The CC superpositions comprise the new query chain aligned with the best-matching chain that has already been chopped in CATH. The darker colours represent the new query and the lighter colours represent the best-match in CATH.
    
- +2. **CATHEDRAL**
- +
- +
-  +
-2. CATHEDRAL+
 This is the next result to check after CC. Any putative domains that match CATH domains with a SSAP over >= 70 (preferably >= 80) indicate a good match. This is the next result to check after CC. Any putative domains that match CATH domains with a SSAP over >= 70 (preferably >= 80) indicate a good match.
  
Line 94: Line 51:
 The CATHEDRAL superpositions comprise the new query chain aligned with the best-matching domains that have already been chopped in CATH. The darker colours represent the new query and the lighter colours represent the best-matching domains in CATH. The CATHEDRAL superpositions comprise the new query chain aligned with the best-matching domains that have already been chopped in CATH. The darker colours represent the new query and the lighter colours represent the best-matching domains in CATH.
  
-3. HMM+3. **HMM**
 Any putative domain that matches a CATH domain with an E-value below 1x10-05 represents a good match. Any putative domain that matches a CATH domain with an E-value below 1x10-05 represents a good match.
  
-4. PUU, Detective, Domak+4. **PUU, Detective, Domak**
 These are ab initio-based algorithms and do not produce scores. These algorithms are very useful in providing results when the query PDB chains do not have any closely-related matches in CATH. If you don't find any chopping you are happy with in the previous steps, have a look at these results. Sometimes, these three algorithms can help to confirm the above-mentioned results. These are ab initio-based algorithms and do not produce scores. These algorithms are very useful in providing results when the query PDB chains do not have any closely-related matches in CATH. If you don't find any chopping you are happy with in the previous steps, have a look at these results. Sometimes, these three algorithms can help to confirm the above-mentioned results.
  
  
-Submitting a chopping to the curator for review+**Submitting a chopping to the curator for review**
  
 If you are completely satisfied with a chopping proposed by one of the above algorithms, please select the 'Send for review' button next to the appropriate chopping. This chain will then be sent to the curator for reviewing. If you are completely satisfied with a chopping proposed by one of the above algorithms, please select the 'Send for review' button next to the appropriate chopping. This chain will then be sent to the curator for reviewing.
Line 112: Line 69:
  
  
-Manual adjustment of choppings+**Manual adjustment of choppings**
  
 It may be necessary at times to manually adjust proposed domain boundaries. For example, if a domain boundary is defined so that it splits secondary structure element (i.e. beta strand, alpha helix) in two. In such cases, choose the chopping that most closely represents your solution and select the 'Inherit Chopping' button (top-right hand corner). It may be necessary at times to manually adjust proposed domain boundaries. For example, if a domain boundary is defined so that it splits secondary structure element (i.e. beta strand, alpha helix) in two. In such cases, choose the chopping that most closely represents your solution and select the 'Inherit Chopping' button (top-right hand corner).
Line 138: Line 95:
  
  
- +**Some (25) examples of chopped chains that have undergone manual curation 
-  +**__Underlined Text__
- +
- +
- +
- +
- +
- +
-Some (25) examples of chopped chains that have undergone manual curation +
 Substituting these chain ids into the following URL will load the relevant web page, which will show you examples of chopped chains that have been reviewed by the CATH curator. Select the 'Chopping' tab and then load the RasMol for for the 'Chopped' result. This page will also inform of the reasons behind the chosen result. Substituting these chain ids into the following URL will load the relevant web page, which will show you examples of chopped chains that have been reviewed by the CATH curator. Select the 'Chopping' tab and then load the RasMol for for the 'Chopped' result. This page will also inform of the reasons behind the chosen result.
  
 http://update.cathdb.info/cgi-bin/DomChop.pl?chain_id=4uj8B http://update.cathdb.info/cgi-bin/DomChop.pl?chain_id=4uj8B
  
-4uj8B +  - 4uj8B 
-4y25A +  - 4y25A 
-4znoB +  - 4znoB 
-5a57A +  - 5a57A 
-5a8jA +  - 5a8jA 
-5aoqA +  - 5aoqA 
-5axgA +  - 5axgA 
-5b04I +  - 5b04I 
-5c0xK +  - 5c0xK 
-5c14A +  - 5c14A 
-5c1fA +  - 5c1fA 
-5c1sA +  - 5c1sA 
-5c22C +  - 5c22C 
-5c2wD +  - 5c2wD 
-5c4nD +  - 5c4nD 
-5c6tA +  - 5c6tA 
-5cwwB +  - 5cwwB 
-5cylF +  - 5cylF 
-5cyxA +  - 5cyxA 
-5cz3A +  - 5cz3A 
-5dcpA +  - 5dcpA 
-5dcqF +  - 5dcqF 
-5dqrA +  - 5dqrA 
-5du3A +  - 5du3A 
-5fx0A+  - 5fx0A
  
 Chopping summary acronymns Chopping summary acronymns
Line 227: Line 176:
  
  
-Happy  domain chopping !!! +**Happy  domain chopping !!!!** 
  
  
Print/export