This is an old revision of the document!
This page contains answers for the most frequently asked questions that we receive at CATH and is the best place to starting looking if you have a question about anything to do with the CATH resource.
Please note, these documentation pages are currently in their infancy so there may be some questions that don't yet have answers. This means that we know the question is important and we will document the answer as soon as we can.
The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures. There are four major levels in this hierarchy:
For any given structure classified in the database, CATH gives you information on the structure and function of that protein. The evolutionary relationships involving the structure of interest and other proteins in the database can also be determined.
CATH also gives an overall view of the known protein structure universe to date. You can find which folds and superfamilies are the most populated, for example, and which structures are rare in nature.
Maintaining the CATH database is very much a team effort. Most of the members of the Orengo group have helped with the manual curation of the database and some have developed algorithms to aid with the automated aspects of maintaining and updating it.
Alison Cuff is the present CATH Curator and Manager; Ian Sillitoe is the present CATH Technical Manager. Tony Lewis was a Research Assistant in the group and was heavily involved in the development of the new CATH update protocol as part of CATH v3.0. He is still involved in maintaining and updating CATH in an ongoing consultancy capacity. Andrew Clegg is developing web services for CATH.
CATH is a tree-like, hierarchical classification that starts off at the tree “trunk” by clustering protein domains into broad categories (e.g. C, or class, where domains are clustered solely based on their general secondary structure content). As the hierarchy moves away from the “trunk” to the “branches”, more stringent clustering criteria are applied to provide clusters of domains with finer granularity of similarity.
| Depth | Letter | Name | Clustering criteria |
|---|---|---|---|
| 1 | C | Class | Secondary structure content |
| 2 | A | Architecture | General spatial arrangement of secondary structures |
| 3 | T | Topology | Spatial arrangement and connectivity of secondary structures (fold) |
| 4 | H | Homologous Superfamily | Manual curation of evidence of evolutionary relationship (at least two criteria from sequence/structure/function must be observed) |
| 5 | S | Sequence Family (S35) | >= 35% sequence similarity |
| 6 | O | Orthologous Family (S60) * | >= 60% sequence similarity |
| 7 | L | “Like” domain (S95) * | >= 95% sequence similarity |
| 8 | I | Identical domain (S100) | 100% sequence similarity |
| 9 | D | Domain counter | Unique domains |
* We are aware that the names “Orthologous” and “Like” are by no means perfect descriptions of the clustering criteria that they represent. However we find it useful to provide some kind of label for these clusters and (quite frankly) these are the best we could come up with.
[from Ivan Kon on 20/10/2008]
CATH is a hierarchical classification that clusters protein structures at differing levels of similarity. The first level, Class, clusters proteins based on their general secondary structure content and is represented by the first number in the CATH code (the 'C' column in the table below).
| Domain | CATH code | C | A | T | H | S | O | L | I | D |
|---|---|---|---|---|---|---|---|---|---|---|
| 1nr3A00 | 3.30.1190.10.1.1.1.1.1 | 3 | 30 | 1190 | 10 | 1 | 1 | 1 | 1 | 1 |
A more detailed explanation on the numbering involved in sequence clusters (SOLID levels) can be found in this blog entry .
For a particular CATH version, for example 3.2.0, the first number indicates the most recent major CATH database release (i.e. version 3.0.0), whilst the second number indicates a minor release. Version 3.2.0 is therefore the second update of the major CATH release 3.0.0. The third number is used for internal purposes.
A domain identifier is assigned to every classified domain in the CATH database. It consists of a 4-character PDB code, for example 1kcm, followed by the chain name, denoted by a letter, and a two-digit domain number. If there is only one chain, it will be assigned the letter A in the same way as the first chain in a multi-chain structure. If there is only one domain in the chain then 00 is used for the domain number. The structure 1kcm has only a single domain in a single chain; the domain identifier will therefore be 1kcmA00.
This was implemented due to the emergence of protein structures with more than nine domains. As experimental techniques for solving crystal structures have improved, the determination of protein structures with a large number of separate domains has increased.
This was due to the wwPDB remediation project. Please click here for further information.
A tutorial on how to search CATH can be found here
The answer to this is use the CATH webservices. However, the CATH webservices are undergoing a major revamp and are still in testing. We will update this section when we move the webservices to production.
We have moved to using clearer accession URLs for entities in CATH that generally follow the form:
http://{VERSION}.cathdb.info/{ENTITY}/{ID}
The following table provides a brief overview of the possible accessors for each CATH entity and is intended as a placeholder until a more complete discussion is added.
| ENTITY | Path | Example |
|---|---|---|
| Domain | /domain/{ID} | http://www.cathdb.info/domain/1cukA01 |
| Chain | /chain/{ID} | http://www.cathdb.info/chain/1cukA |
| Pdb | /pdb/{ID} | http://www.cathdb.info/pdb/1cuk |
| Classification | /cathnode/{ID} | http://www.cathdb.info/cathnode/1.10.10.10 |
We are working towards making these revised URLs compatible with previous versions of CATH. However, due to the significant changes between CATH v3.1 and v3.2, this isn't a particularly trivial job so please bear with us.
| VERSION | Example |
|---|---|
| v3-1 | http://v3-1.cathdb.info |
| v3-2 | http://v3-2.cathdb.info |
The following links provide PDB coordinates for the PDB (1cuk), PDB Chain (1cukA) and CATH domain (1cukA01):
http://www.cathdb.info/search/by_text?q=1cuk http://www.cathdb.info/search/by_text?q=1cuka http://www.cathdb.info/search/by_text?q=1cuka01
If you would like us to link to your resource and there is a natural mapping from one of the CATH entities (PDB, PDB Chain, Domain, Classification, etc) then get in touch.