User Tools

Site Tools


data:index

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data:index [2010/02/08 17:20] sillitoedata:index [2017/10/14 15:28] (current) sillitoe
Line 1: Line 1:
 +
 +====== CATH Data Downloads ======
 +
 +This page provides information on the data files that are available to download from the [[ftp://orengoftp.biochem.ucl.ac.uk/cath | CATH FTP site]].
 +
 +See [[:index#cath_releases|CATH Releases]] for more information on CATH and CATH-Plus.
 +
 +===== CATH (daily snapshot) =====
 +
 +ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/
 +
 +^ File name ^ Description ^
 +| cath-b-newest-all.gz | List the latest domain boundaries and superfamily (C.A.T.H) annotations for all CATH domains |
 +| cath-b-newest-names.gz | Provides the names for each node in the CATH hierarchy | 
 +| cath-b-newest-latest-release.gz | List the latest domain boundaries and superfamily annotations for CATH domains in the most recent release of CATH-Plus |
 +| cath-b-newest-putative.gz | List the latest domain boundaries and superfamily annotations for CATH domains released since the most release release of CATH-Plus |
 +| cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives | 
 +
 +===== CATH-Plus (full release) =====
 +
 +ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/
 +
 +For information on the statistics for specific releases, see [[../release_notes|release notes]].
 +
 +==== CATH classification data ====
 +
 +ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/
 +
 +^ File name ^ Description ^
 +| cath-chain-list-<version>.txt          | Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not. |
 +| cath-domain-boundaries-*-<version>.txt | Description of domain and segment boundaries for domains classified into CATH. |
 +| cath-domain-description-file-<version>.txt | Description of each protein domain in CATH |
 +| cath-domain-list-<S35%%%|%%S60%%|%%S95%%|%%S100%%|%%all>-<version>.txt | Lists of domains classified into CATH |
 +| cath-domain-pdb-*-<version>.txt | Description of each domain PDB classified into CATH |
 +| cath-names-<version>.txt             |Name description of each node in the CATH hierarchy, along with an example domain |
 +| cath-superfamily-list-<version>.txt  | List of all the superfamilies in the CATH hierarchy |
 +| cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed |
 +
 +==== Non-redundant data sets ====
 +
 +ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/
 +
 +^ File name ^ Description ^
 +| cath-dataset-nonredundant-S[20%%|%%40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) |
 +| cath-dataset-nonredundant-S[20%%|%%40].fa | The sequences of the domains in the dataset |
 +| cath-dataset-nonredundant-S[20%%|%%40].list | A list of the domains in the dataset; one domain ID per line |
 +| cath-dataset-nonredundant-S[20%%|%%40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set |
 +
 +==== Sequence data ====
 +
 +ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/
 +
 +^ File name ^ Description ^
 +| cath-domain-seqs-*-<version>.fa | Sequences for each CATH domain |
 +| cath-S35-<version>-hmm3.lib.gz  | HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity |
 +| funfam-hmm3-<version>.lib.gz    | HMMs for each functional family (FunFam) |
 +| cath-superfamily-seqs-<superfamily>-<version>.fa | Sequences for each CATH superfamily in FASTA format |
 +