The testing plugin is enabled and should be disabled.
Differences
This shows you the differences between two versions of the page.
data_curation:superfamily_naming_tutorial:index [2023/09/28 16:02] vwaman |
data_curation:superfamily_naming_tutorial:index [2023/09/28 16:24] (current) vwaman |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | [[data_curation:superfamily_naming_tutorial:index|Superfamily Naming exercise (Last updated in Sept 2023)]] | ||
=== Superfamily Naming exercise (Last updated in Sept 2023) === | === Superfamily Naming exercise (Last updated in Sept 2023) === | ||
- | Useful websites: | + | **Useful websites:** |
https://www.cathdb.info/ | https://www.cathdb.info/ | ||
http://sfam.cathdb.info/ | http://sfam.cathdb.info/ | ||
**Part I: Steps followed for naming a superfamily** | **Part I: Steps followed for naming a superfamily** | ||
- | - Look through representative domains as: ‘domain only’ to understand common secondary structures; as ‘domain in chain’ to observe the location of the domain in the chain; as ‘domain in PDB’ to understand the domain’s function and location in the protein. | + | * Look through representative domains as: ‘domain only’ to understand common secondary structures; as ‘domain in chain’ to observe the location of the domain in the chain; as ‘domain in PDB’ to understand the domain’s function and location in the protein. |
- | - Check through FunFams/SwissProt/Keywords and refer to the most abundant name when naming. | + | * Check through FunFams/SwissProt/Keywords and refer to the most abundant name when naming. |
- | - Check through enzymes (EC number if available), GO terms and species to get a rough idea of domain function. | + | * Check through enzymes (EC number if available), GO terms and species to get a rough idea of domain function. |
- | - Refer to Pfam and InterPro entries for general idea of protein domain function and/or structure. | + | * Refer to Pfam and InterPro entries for general idea of protein domain function and/or structure. |
- | - Check through papers associated with PDB entry for better understanding of protein and protein domain structure and/or function . | + | * Check through papers associated with PDB entry for better understanding of protein and protein domain structure and/or function . |
- | - In ‘Description section’, provide an overview of structure and function. In larger superfamilies, you may have to refer to specific PDB IDs. | + | * In ‘Description section’, provide an overview of structure and function. In larger superfamilies, you may have to refer to specific PDB IDs. |
- | - Check references are correct: [InterPro:] [Pfam:] [PMID:] [DOI:] | + | * Check references are correct: [InterPro:] [Pfam:] [PMID:] [DOI:] |
- | - Check other names in the database, either to avoid duplicate names or to identify potential cross-hits | + | * Check other names in the database, either to avoid duplicate names or to identify potential cross-hits |
- | - Check names of other domains in the same chain to keep the name similar. | + | * Check names of other domains in the same chain to keep the name similar. |
- | Part II: General observations and tips | + | **Part II: General observations and tips** |
Dos | Dos | ||
- | • Check other names in CATH to not make duplicates (i.e. make sure the assigned name is unique) | + | * Check other names in CATH to not make duplicates (i.e. make sure the assigned name is unique) |
- | • Make superfamily names consistent with other domains of same protein | + | * Make superfamily names consistent with other domains of same protein |
- | • Start with smaller families until you get the hang of it | + | * Start with smaller families until you get the hang of it |
- | • For larger superfamily- it is a good idea to check FunFam | + | * For larger superfamily- it is a good idea to check FunFam |
- | • When looking at a protein on InterPro, see if there are other domains that don’t have a name yet on the same protein - it will be easy to name that one | + | * When looking at a protein on InterPro, see if there are other domains that don’t have a name yet on the same protein - it will be easy to name that one |
- | • Work in groups for larger superfamilies | + | * Work in groups for larger superfamilies |
- | • Choose superfamily entries with FunFams, Pfams, or InterPro associated | + | * Choose superfamily entries with FunFams, Pfams, or InterPro associated |
Don’ts | Don’ts | ||
- | • Make description without sourcing references | + | * Make description without sourcing references |
- | • Make description without actually really understanding it | + | * Make description without actually really understanding it |
- | • Spend 3 hours on a very small superfamily | + | * Spend 3 hours on a very small superfamily |
- | • Look at every single PDB for big superfamilies | + | * Look at every single PDB for big superfamilies |
- | • For smaller representative domains, don’t put too much confidence in InterPro/Pfam - it may be better to look at PDB paper for the specific domain | + | * For smaller representative domains, don’t put too much confidence in InterPro/Pfam - it may be better to look at PDB paper for the specific domain |
- | • Assume it is the exact same domain if it has good mapping to Pfam | + | * Assume it is the exact same domain if it has good mapping to Pfam |
- | • Choose a superfamily entry with no annotation or too many annotation | + | * Choose a superfamily entry with no annotation or too many annotation |
+ | |||
+ | (Last updated in September 2023, Written by summer interns since 2020-2023 (Barbara, Oliver, Natalie, Charling, Ruiqi, Lorna, Katie, Charlotte, Hazuki) and CATH curators (Vaishali Waman, Ian Sillitoe) | ||