This is an old revision of the document!
CATH Data Downloads
This page provides information on the data files that are available to download from the CATH FTP site:
CATH (daily)
We provide a daily snapshot of our very latest classifications and annotations as they happen in our pipeline. This enables users to find the most up-to-date information about their particular structure of interest. The amount of data we provide at this stage is limited mainly to domain boundaries and superfamily classification.
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/
File name | Description |
---|---|
cath-b-newest-all.gz | List the latest domain boundaries and superfamily (C.A.T.H) annotations for all CATH domains |
cath-b-newest-names.gz | Provides the names for each node in the CATH hierarchy |
cath-b-newest-latest-release.gz | List the latest domain boundaries and superfamily annotations for CATH domains in the most recent release of CATH-Plus |
cath-b-newest-putative.gz | List the latest domain boundaries and superfamily annotations for CATH domains released since the most release release of CATH-Plus |
cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives |
CATH-Plus
CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks (e.g. looking for evidence that would support merging superfamilies, checking for errors, etc) in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may not contain information on the most recent structures.
For information on the statistics from specific releases, see release notes.
Data related to the CATH classification
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/
File name | Description |
---|---|
cath-chain-list-<version>.txt | Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not. |
cath-domain-boundaries-*-<version>.txt | Description of domain and segment boundaries for domains classified into CATH. |
cath-domain-description-file-<version>.txt | Description of each protein domain in CATH |
cath-domain-list-<S35%|S60|S95|S100|all>-<version>.txt | Lists of domains classified into CATH |
cath-domain-pdb-*-<version>.txt | Description of each domain PDB classified into CATH |
cath-names-<version>.txt | Name description of each node in the CATH hierarchy, along with an example domain |
cath-superfamily-list-<version>.txt | List of all the superfamilies in the CATH hierarchy |
cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed |
Data related to non-redundant data sets
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/
File name | Description |
---|---|
cath-dataset-nonredundant-S[20|40]-v4_1_0.atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) |
cath-dataset-nonredundant-S[20|40]-v4_1_0.fa | The sequences of the domains in the dataset |
cath-dataset-nonredundant-S[20|40]-v4_1_0.list | A list of the domains in the dataset; one domain ID per line |
cath-dataset-nonredundant-S[20|40]-v4_1_0.pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set |
Data related to sequence data
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/
File name | Description |
---|---|
cath-domain-seqs-*-<version>.fa | Sequences for each CATH domain |
cath-S35-<version>-hmm3.lib.gz | HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity |
funfam-hmm3-<version>.lib.gz | HMMs for each functional family (FunFam) |
cath-superfamily-seqs-<superfamily>-<version>.fa | Sequences for each CATH superfamily in FASTA format |