This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| data_curation:superfamily_naming_tutorial:index [2023/09/28 16:02] – vwaman | data_curation:superfamily_naming_tutorial:index [2023/09/28 16:24] (current) – vwaman | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | [[data_curation: | ||
| === Superfamily Naming exercise (Last updated in Sept 2023) === | === Superfamily Naming exercise (Last updated in Sept 2023) === | ||
| - | Useful websites: | + | **Useful websites:** |
| https:// | https:// | ||
| http:// | http:// | ||
| **Part I: Steps followed for naming a superfamily** | **Part I: Steps followed for naming a superfamily** | ||
| - | - Look through representative domains as: ‘domain only’ to understand common secondary structures; as ‘domain in chain’ to observe the location of the domain in the chain; as ‘domain in PDB’ to understand the domain’s function and location in the protein. | + | * Look through representative domains as: ‘domain only’ to understand common secondary structures; as ‘domain in chain’ to observe the location of the domain in the chain; as ‘domain in PDB’ to understand the domain’s function and location in the protein. |
| - | - Check through FunFams/ | + | |
| - | - Check through enzymes (EC number if available), GO terms and species | + | |
| - | - Refer to Pfam and InterPro | + | |
| - | - Check through papers associated with PDB entry for better understanding of protein and protein domain structure and/or function . | + | |
| - | - In ‘Description section’, | + | |
| - | - Check references are correct: [InterPro:] [Pfam:] [PMID:] [DOI:] | + | |
| - | - Check other names in the database, either to avoid duplicate names or to identify potential cross-hits | + | |
| - | - Check names of other domains in the same chain to keep the name similar. | + | |
| - | Part II: General observations and tips | + | **Part II: General observations and tips** |
| Dos | Dos | ||
| - | • Check other names in CATH to not make duplicates (i.e. make sure the assigned name is unique) | + | * Check other names in CATH to not make duplicates (i.e. make sure the assigned name is unique) |
| - | • Make superfamily names consistent with other domains of same protein | + | |
| - | • Start with smaller families until you get the hang of it | + | |
| - | • For larger superfamily- it is a good idea to check FunFam | + | |
| - | • When looking at a protein on InterPro, see if there are other domains that don’t have a name yet on the same protein - it will be easy to name that one | + | |
| - | • Work in groups for larger superfamilies | + | |
| - | • Choose superfamily entries with FunFams, Pfams, or InterPro associated | + | |
| Don’ts | Don’ts | ||
| - | • Make description without sourcing references | + | * Make description without sourcing references |
| - | • Make description without actually really understanding it | + | |
| - | • Spend 3 hours on a very small superfamily | + | |
| - | • Look at every single PDB for big superfamilies | + | |
| - | • For smaller representative domains, don’t put too much confidence in InterPro/ | + | |
| - | • Assume it is the exact same domain if it has good mapping to Pfam | + | |
| - | • Choose a superfamily entry with no annotation or too many annotation | + | |
| + | |||
| + | (Last updated in September 2023, Written by summer interns since 2020-2023 (Barbara, Oliver, Natalie, Charling, Ruiqi, Lorna, Katie, Charlotte, Hazuki) and CATH curators (Vaishali Waman, Ian Sillitoe) | ||