The testing plugin is enabled and should be disabled.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

index [2016/08/02 11:24]
sillitoe
index [2017/10/16 15:36] (current)
sayoni
Line 3: Line 3:
 The CATH database is a free, publicly available online resource that provides  The CATH database is a free, publicly available online resource that provides 
 information on the evolutionary relationships of protein domains. It was  information on the evolutionary relationships of protein domains. It was 
-created in the mid-1990s by Professor Christine Orengo and colleagues, and  +created in the mid-1990s by [[https://www.ucl.ac.uk/orengo-group/lab-members/christine-orengo|Professor Christine Orengo]] and colleagues, and continues to be developed by the [[cathteam:index|Orengo group]] at University College London.
-continues to be developed by the Orengo group at University College London.+
  
 ===== How is CATH-Gene3D created? ===== ===== How is CATH-Gene3D created? =====
Line 12: Line 11:
 applicable. Protein domains are identified within these chains using a mixture applicable. Protein domains are identified within these chains using a mixture
 of automatic methods and manual curation. The domains are then classified within of automatic methods and manual curation. The domains are then classified within
-the CATH structural hierarchy: at the Class (C) level, domains are assigned+the CATH structural hierarchy: at the [[glossary:class|Class]] (C) level, domains are assigned
 according to their secondary structure content, i.e. all alpha, all beta, a according to their secondary structure content, i.e. all alpha, all beta, a
-mixture of alpha and beta, or little secondary structure; at the Architecture+mixture of alpha and beta, or little secondary structure; at the [[glossary:architecture|Architecture]]
 (A) level, information on the secondary structure arrangement in (A) level, information on the secondary structure arrangement in
-three-dimensional space is used for assignment; at the Topology/fold (T) level,+three-dimensional space is used for assignment; at the [[glossary:topology|Topology/fold]] (T) level,
 information on how the secondary structure elements are connected and arranged information on how the secondary structure elements are connected and arranged
-is used; assignments are made to the Homologous superfamily (H) level if there +is used; assignments are made to the [[glossary:homologous_superfamily|Homologous superfamily]] (H) level if there is good evidence that the domains are related by evolution, i.e. they are 
-is good evidence that the domains are related by evolution, i.e. they are +homologous. To browse the classification hierarchy, see [[http://cathdb.info/browse/tree|CATH hierarchy]].
-homologous.+
  
 Additional sequence data for domains with no experimentally determined Additional sequence data for domains with no experimentally determined
-structures are provided by our sister resource, Gene3D, which are used to +structures are provided by our sister resource, [[http://gene3d.biochem.ucl.ac.uk/Gene3D|Gene3D]], which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and
-populate the homologous superfamilies. Protein sequences from UniProtKB and+
 Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and
 make homologous superfamily assignments. make homologous superfamily assignments.
Line 30: Line 27:
 ===== CATH Releases ===== ===== CATH Releases =====
  
-We aim to provide official releases of the CATH classification every 12 months. +==== CATH (daily snapshot) ====
-This release process is important because is allows us to provide internal +
-validation, extra annotations and analysis. However, it can mean that there is a +
-time delay between new structures appearing in the PDB and the latest official  +
-CATH release,+
  
-In order to address this issue: CATH-B provides limited amount of information +We provide daily snapshot of the very latest classifications and annotations as they happen in our pipelineThis enables users to find the most up-to-date information about their particular structure of interestThe amount of data we provide at this stage is limited mainly to domain boundaries and superfamily classification.
-to the very latest domain annotations (e.g. domain boundaries and superfamily +
-classifications).+
  
-The latest release of CATH-Gene3D (v4.1was released in July 2016 and  +==== CATH-Plus (full release====
-consists of:+
  
-  * 308,999    structural protein domain entries +We aim to provide full releases of CATH (CATH-Plus) every 12 months. CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks in addition to adding a huge amount of information combining protein structuresequence and function. As a resultthere is a greater depth of information available in CATH-Plus, though it may be missing information on the most recent structures. 
-  * 53,479,436 non-structural protein domain entries + 
-  * 2,737       homologous superfamily entries +CATH-Plus data includes: 
-  * 92,882      functional family entries+ 
 +=== FunFams (Functional Families) === 
 + 
 +The homologous superfamilies in CATH-Gene3D can often be functionally and structurally diverse even though they share a conserved structural core. Therefore, the superfamilies have been sub-classified into functional families (FunFams) using a subclassification protocol purely based on sequence patterns. Relatives within these FunFams are likely to share highly similar structures and functions. The FunFams are useful in function prediction and in providing information on the evolution of function. 
 + 
 +=== Structural clusters === 
 + 
 +The structures within a homologous superfamily have been clustered at < 9 Å RMSD to form structural clustersalso known as structurally-similar groups (SSGs). These structural clusters are useful for understanding the structural diversity of a superfamily. 
 + 
 +=== Structural superpositions === 
 + 
 +The conserved structural core in the homologous superfamilies can be observed from the structural superpositions generated from its representative domains by [[cath_tools#cath_tools|CATH Tools]]. It is an effective way of observing the structural conservation and diversity across the superfamily. 
 + 
 +See [[release_notes|release notes]] for information on the statistics for specific releases. 
 + 
 +CATH and CATH-Plus data for all releases can be downloaded from [[data:index|Data Downloads]].
  
 ===== Open Source Software ===== ===== Open Source Software =====
Print/export