====== CATH Data Downloads ====== This page provides information on the data files that are available to download from the [[ftp://orengoftp.biochem.ucl.ac.uk/cath | CATH FTP site]]. See [[:index#cath_releases|CATH Releases]] for more information on CATH and CATH-Plus. ===== CATH (daily snapshot) ===== ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/ ^ File name ^ Description ^ | cath-b-newest-all.gz | List the latest domain boundaries and superfamily (C.A.T.H) annotations for all CATH domains | | cath-b-newest-names.gz | Provides the names for each node in the CATH hierarchy | | cath-b-newest-latest-release.gz | List the latest domain boundaries and superfamily annotations for CATH domains in the most recent release of CATH-Plus | | cath-b-newest-putative.gz | List the latest domain boundaries and superfamily annotations for CATH domains released since the most release release of CATH-Plus | | cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives | ===== CATH-Plus (full release) ===== ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/ For information on the statistics for specific releases, see [[../release_notes|release notes]]. ==== CATH classification data ==== ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/ ^ File name ^ Description ^ | cath-chain-list-.txt | Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not. | | cath-domain-boundaries-*-.txt | Description of domain and segment boundaries for domains classified into CATH. | | cath-domain-description-file-.txt | Description of each protein domain in CATH | | cath-domain-list--.txt | Lists of domains classified into CATH | | cath-domain-pdb-*-.txt | Description of each domain PDB classified into CATH | | cath-names-.txt |Name description of each node in the CATH hierarchy, along with an example domain | | cath-superfamily-list-.txt | List of all the superfamilies in the CATH hierarchy | | cath-unclassified-list-.txt | List of all unclassified protein chains and domains that are still being processed | ==== Non-redundant data sets ==== ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/ ^ File name ^ Description ^ | cath-dataset-nonredundant-S[20%%|%%40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | | cath-dataset-nonredundant-S[20%%|%%40].fa | The sequences of the domains in the dataset | | cath-dataset-nonredundant-S[20%%|%%40].list | A list of the domains in the dataset; one domain ID per line | | cath-dataset-nonredundant-S[20%%|%%40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set | ==== Sequence data ==== ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/ ^ File name ^ Description ^ | cath-domain-seqs-*-.fa | Sequences for each CATH domain | | cath-S35--hmm3.lib.gz | HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity | | funfam-hmm3-.lib.gz | HMMs for each functional family (FunFam) | | cath-superfamily-seqs--.fa | Sequences for each CATH superfamily in FASTA format |