User Tools

Site Tools


data:index

This is an old revision of the document!


CATH Data

This page provides information on the data files that are available to download from the CATH FTP site:

ftp://ftp.biochem.ucl.ac.uk/cath/

For information on the statistics from specific releases, see this page: release notes.

File name Description
cath-chain-list-<version>.txt Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not.
cath-domain-boundaries-*-<version>.txt Description of domain and segment boundaries for domains classified into CATH.
cath-domain-description-file-<version>.txt Description of each protein domain in CATH
cath-domain-list-<S35%|S60|S95|S100|all>-<version>.txt Lists of domains classified into CATH
cath-domain-pdb-*-<version>.txt Description of each domain PDB classified into CATH
cath-names-<version>.txt Name description of each node in the CATH hierarchy, along with an example domain
cath-superfamily-list-<version>.txt List of all the superfamilies in the CATH hierarchy
cath-unclassified-list-<version>.txt List of all unclassified protein chains and domains that are still being processed
File name Description
cath-dataset-nonredundant-S[20|40]-v4_1_0.atom.fa The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file)
cath-dataset-nonredundant-S[20|40]-v4_1_0.fa The sequences of the domains in the dataset
cath-dataset-nonredundant-S[20|40]-v4_1_0.list A list of the domains in the dataset; one domain ID per line
cath-dataset-nonredundant-S[20|40]-v4_1_0.pdb.tgz (A gzipped tar file containing) the PDB files of the domains in the data set
File name Description
cath-domain-seqs-*-<version>.fa Sequences for each CATH domain
cath-S35-<version>-hmm3.lib.gz HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity
funfam-hmm3-<version>.lib.gz HMMs for each functional family (FunFam)
cath-superfamily-seqs-<superfamily>-<version>.fa Sequences for each CATH superfamily in FASTA format
data/index.1468339023.txt.gz · Last modified: by nataliedawson