The testing plugin is enabled and should be disabled.

This is an old revision of the document!


What is CATH?

The CATH database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues, and continues to be developed by the Orengo group at University College London.

How is CATH-Gene3D created?

Experimentally-determined protein three-dimensional structures are obtained from the Protein Data Bank and split into their consecutive polypeptide chains, where applicable. Protein domains are identified within these chains using a mixture of automatic methods and manual curation. The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution, i.e. they are homologous.

Additional sequence data for domains with no experimentally determined structures are provided by our sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments.

CATH Releases

CATH (daily snapshot)

We provide a daily snapshot of the very latest classifications and annotations as they happen in our pipeline. This enables users to find the most up-to-date information about their particular structure of interest. The amount of data we provide at this stage is limited mainly to domain boundaries and superfamily classification.

CATH-Plus (full release)

We aim to provide full releases of CATH (CATH-Plus) every 12 months. CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may be missing information on the most recent structures.

See release notes for information on the statistics for specific releases.

The latest release of CATH-Plus (v4.1) was released in July 2016 and consists of:

  • 308,999 structural protein domain entries
  • 53,479,436 non-structural protein domain entries
  • 2,737 homologous superfamily entries
  • 92,882 functional family entries

CATH and CATH-Plus data for all releases can be downloaded from Data Downloads.

Open Source Software

CATH is proud to be a member of the open source software community. Our developers use and contribute towards the development and maintenance of a number of open source tools. For a full list of the open source software used in the making of this resource (both in the pipeline and our web pages), please visit the CATH tools page.

Contact us

If you have any comments/suggestions/criticisms, please let us know:

http://www.cathdb.info/support/contact

Print/export