Useful websites:
https://www.cathdb.info/
http://sfam.cathdb.info/
Part I: Steps followed for naming a superfamily
Look through representative domains as: ‘domain only’ to understand common secondary structures; as ‘domain in chain’ to observe the location of the domain in the chain; as ‘domain in PDB’ to understand the domain’s function and location in the protein.
Check through FunFams/SwissProt/Keywords and refer to the most abundant name when naming.
Check through enzymes (EC number if available), GO terms and species to get a rough idea of domain function.
Refer to Pfam and InterPro entries for general idea of protein domain function and/or structure.
Check through papers associated with PDB entry for better understanding of protein and protein domain structure and/or function .
In ‘Description section’, provide an overview of structure and function. In larger superfamilies, you may have to refer to specific PDB IDs.
Check references are correct: [InterPro:] [Pfam:] [PMID:] [DOI:]
Check other names in the database, either to avoid duplicate names or to identify potential cross-hits
Check names of other domains in the same chain to keep the name similar.
Part II: General observations and tips
Dos
Check other names in CATH to not make duplicates (i.e. make sure the assigned name is unique)
Make superfamily names consistent with other domains of same protein
Start with smaller families until you get the hang of it
For larger superfamily- it is a good idea to check FunFam
When looking at a protein on InterPro, see if there are other domains that don’t have a name yet on the same protein - it will be easy to name that one
Work in groups for larger superfamilies
Choose superfamily entries with FunFams, Pfams, or InterPro associated
Don’ts
Make description without sourcing references
Make description without actually really understanding it
Spend 3 hours on a very small superfamily
Look at every single PDB for big superfamilies
For smaller representative domains, don’t put too much confidence in InterPro/Pfam - it may be better to look at PDB paper for the specific domain
Assume it is the exact same domain if it has good mapping to Pfam
Choose a superfamily entry with no annotation or too many annotation
(Last updated in September 2023, Written by summer interns since 2020-2023 (Barbara, Oliver, Natalie, Charling, Ruiqi, Lorna, Katie, Charlotte, Hazuki) and CATH curators (Vaishali Waman, Ian Sillitoe)