This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| index [2017/10/16 14:33] – sayoni | index [2024/11/26 14:54] (current) – [Expansion in CATH structural data from AlphaFold Database] sillitoe | ||
|---|---|---|---|
| Line 6: | Line 6: | ||
| ===== How is CATH-Gene3D created? ===== | ===== How is CATH-Gene3D created? ===== | ||
| - | |||
| - | ==== CATH ==== | ||
| Experimentally-determined protein three-dimensional structures are obtained from | Experimentally-determined protein three-dimensional structures are obtained from | ||
| Line 20: | Line 18: | ||
| information on how the secondary structure elements are connected and arranged | information on how the secondary structure elements are connected and arranged | ||
| is used; assignments are made to the [[glossary: | is used; assignments are made to the [[glossary: | ||
| - | homologous. | + | homologous. |
| - | + | ||
| - | ==== Gene3D ==== | + | |
| Additional sequence data for domains with no experimentally determined | Additional sequence data for domains with no experimentally determined | ||
| Line 29: | Line 25: | ||
| make homologous superfamily assignments. | make homologous superfamily assignments. | ||
| - | ==== FunFams | + | == Recognition as a Global Core BioData Resource |
| + | CATH has been recognized as a Global Core BioData Resource | ||
| + | |||
| + | |||
| + | ===== Expansion in CATH structural data from AlphaFold Database ===== | ||
| + | |||
| + | We are pleased to announce the release of CATH v4.4 (October 2024 ; https:// | ||
| + | |||
| + | == Integration of domains from The Encyclopedia of Domains (TED) == | ||
| + | CATH v4.4 incorporates approximately ~600.000 newly classified domain structures from the Protein Data Bank (PDB) and maps over 90 million predicted domain structures from the Encyclopedia of Domains (TED) resource into CATH superfamilies—a joint effort between the Jones group (UCL Computer Science) and the Orengo group (UCL Structural and Molecular Biology). This integration has resulted in a 180-fold increase in structural information for CATH superfamilies. | ||
| + | |||
| + | The inclusion of TED data has expanded the number of superfamilies from 5,841 to 6,573, folds from 1,349 to 2,081, and architectures from 41 to 77. It is important to note that the TED data comprises predicted structures, and these new folds and architectures remain hypothetical until experimentally confirmed. | ||
| + | |||
| + | Advancements in Domain Segmentation and Classification: | ||
| + | To manage the substantial volume of data from AlphaFold Protein Structure Database, our automated domain segmentation workflow has been enhanced. We have integrated a faster and more accurate in-house deep-learning approach called Chainsaw, along with the publicly available methods Merizo and UniDoc. For homologue detection and verification, | ||
| + | |||
| + | Expansion of Functional Families (FunFams): | ||
| + | Within superfamilies, | ||
| + | This expansion enhances our ability to analyze conserved residues within protein families and to identify putative functional sites, contributing to a deeper understanding of protein function and evolution. | ||
| + | |||
| + | Identification of Novel Folds and Architectures: | ||
| + | Analysis of TED data has led to the identification of 479 new folds and 34 new architectures, | ||
| + | |||
| + | Future Directions: | ||
| + | The extensive data integrated into CATH v4.4 presents opportunities for further exploration of protein structures and evolutionary relationships. Ongoing efforts will focus on refining algorithms and workflows to improve domain boundary assignments, | ||
| - | The homologous superfamilies in CATH-Gene3D can often be functionally and structurally diverse even though they share a conserved structural core. However, when predicting function, it is important to identify domain relatives within a superfamily that share the same function. To this end, the homologous superfamilies in CATH-Gene3D have been sub-classified into functional families (FunFams) using a subclassification protocol purely based on sequence patterns. Relatives within these FunFams are likely to share highly similar structures and functions. | ||
| ===== CATH Releases ===== | ===== CATH Releases ===== | ||
| Line 42: | Line 62: | ||
| We aim to provide full releases of CATH (CATH-Plus) every 12 months. CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may be missing information on the most recent structures. | We aim to provide full releases of CATH (CATH-Plus) every 12 months. CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may be missing information on the most recent structures. | ||
| + | |||
| + | CATH-Plus data includes: | ||
| + | |||
| + | === FunFams (Functional Families) === | ||
| + | |||
| + | The homologous superfamilies in CATH-Gene3D can often be functionally and structurally diverse even though they share a conserved structural core. Therefore, the superfamilies have been sub-classified into functional families (FunFams) using a subclassification protocol purely based on sequence patterns. Relatives within these FunFams are likely to share highly similar structures and functions. The FunFams are useful in function prediction and in providing information on the evolution of function. | ||
| + | |||
| + | === Structural clusters === | ||
| + | |||
| + | The structures within a homologous superfamily have been clustered at < 9 Å RMSD to form structural clusters, also known as structurally-similar groups (SSGs). These structural clusters are useful for understanding the structural diversity of a superfamily. | ||
| + | |||
| + | === Structural superpositions === | ||
| + | |||
| + | The conserved structural core in the homologous superfamilies can be observed from the structural superpositions generated from its representative domains by [[cath_tools# | ||
| See [[release_notes|release notes]] for information on the statistics for specific releases. | See [[release_notes|release notes]] for information on the statistics for specific releases. | ||
| Line 55: | Line 89: | ||
| If you have any comments/ | If you have any comments/ | ||
| - | http:// | + | https:// |