The testing plugin is enabled and should be disabled.
Differences
This shows you the differences between two versions of the page.
data:index [2016/07/12 15:27] nataliedawson [CATH Data] |
data:index [2017/10/14 15:28] (current) sillitoe |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== CATH Data ====== | + | ====== CATH Data Downloads ====== |
- | This page provides information on the data files that are available to download from CATH. | + | This page provides information on the data files that are available to download from the [[ftp://orengoftp.biochem.ucl.ac.uk/cath | CATH FTP site]]. |
- | [[ftp://ftp.biochem.ucl.ac.uk/cath/]] | + | See [[:index#cath_releases|CATH Releases]] for more information on CATH and CATH-Plus. |
- | You can also look at the [[../release_notes|release notes]] for more information on what happened in each release. | + | ===== CATH (daily snapshot) ===== |
+ | |||
+ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/ | ||
+ | |||
+ | ^ File name ^ Description ^ | ||
+ | | cath-b-newest-all.gz | List the latest domain boundaries and superfamily (C.A.T.H) annotations for all CATH domains | | ||
+ | | cath-b-newest-names.gz | Provides the names for each node in the CATH hierarchy | | ||
+ | | cath-b-newest-latest-release.gz | List the latest domain boundaries and superfamily annotations for CATH domains in the most recent release of CATH-Plus | | ||
+ | | cath-b-newest-putative.gz | List the latest domain boundaries and superfamily annotations for CATH domains released since the most release release of CATH-Plus | | ||
+ | | cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives | | ||
+ | |||
+ | ===== CATH-Plus (full release) ===== | ||
+ | |||
+ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/ | ||
+ | |||
+ | For information on the statistics for specific releases, see [[../release_notes|release notes]]. | ||
+ | |||
+ | ==== CATH classification data ==== | ||
+ | |||
+ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/ | ||
+ | |||
+ | ^ File name ^ Description ^ | ||
+ | | cath-chain-list-<version>.txt | Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not. | | ||
+ | | cath-domain-boundaries-*-<version>.txt | Description of domain and segment boundaries for domains classified into CATH. | | ||
+ | | cath-domain-description-file-<version>.txt | Description of each protein domain in CATH | | ||
+ | | cath-domain-list-<S35%%%|%%S60%%|%%S95%%|%%S100%%|%%all>-<version>.txt | Lists of domains classified into CATH | | ||
+ | | cath-domain-pdb-*-<version>.txt | Description of each domain PDB classified into CATH | | ||
+ | | cath-names-<version>.txt |Name description of each node in the CATH hierarchy, along with an example domain | | ||
+ | | cath-superfamily-list-<version>.txt | List of all the superfamilies in the CATH hierarchy | | ||
+ | | cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed | | ||
+ | |||
+ | ==== Non-redundant data sets ==== | ||
+ | |||
+ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/ | ||
+ | |||
+ | ^ File name ^ Description ^ | ||
+ | | cath-dataset-nonredundant-S[20%%|%%40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | | ||
+ | | cath-dataset-nonredundant-S[20%%|%%40].fa | The sequences of the domains in the dataset | | ||
+ | | cath-dataset-nonredundant-S[20%%|%%40].list | A list of the domains in the dataset; one domain ID per line | | ||
+ | | cath-dataset-nonredundant-S[20%%|%%40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set | | ||
+ | |||
+ | ==== Sequence data ==== | ||
+ | |||
+ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/ | ||
+ | |||
+ | ^ File name ^ Description ^ | ||
+ | | cath-domain-seqs-*-<version>.fa | Sequences for each CATH domain | | ||
+ | | cath-S35-<version>-hmm3.lib.gz | HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity | | ||
+ | | funfam-hmm3-<version>.lib.gz | HMMs for each functional family (FunFam) | | ||
+ | | cath-superfamily-seqs-<superfamily>-<version>.fa | Sequences for each CATH superfamily in FASTA format | | ||
- | ^ Type ^ Description ^ | ||
- | | [[cath-domain-pdb-S35-{VERSION}.tgz ]] | Library of chopped PDB files for representative CATH domains | | ||
- | | [[cath-domain-pdb-{VERSION}.tgz ]] | Library of chopped PDB files for all CATH domains | | ||
- | | CathCathedral library | Library of domain secondary structure graphs for CATHEDRAL for representative CATH domains | | ||
- | | CathHmm | HMM library file for CATH domains | | ||
- | | CathHmm (+unclassified) | HMM library file for CATH domains (including those that are waiting to be classified) | | ||
- | | [[CathDomainDescriptionFile]] | Full description of CATH domains | | ||
- | | [[CathDomall]] | Domain boundaries for each PDB Chain in "domall" format | | ||
- | | CathUnclassifiedList | List of CATH domains that are waiting to be assigned | | ||
- | | Chains | List of PDB chains in CATH | | ||
- | | [[glossary:atom | Domain Sequences (ATOM) ]] |FASTA sequence database for all CATH domains (based on ATOM records in PDB) | | ||
- | |[[glossary:combs | Domain Sequences (COMBS) ]] |FASTA sequence database for all CATH domains (based on COMBS sequence data) | | ||
- | | [[CathDomainList]] |List of assigned CATH domains | | ||
- | | [[CathNames]] |List of manually assigned names of CATH classification nodes | | ||
- | |Representative Domain Sequences (ATOM) |FASTA sequence database for CATH domains (clustered at different levels of sequence identity) | | ||
- | |Representative Domain Sequences (COMBS) |FASTA sequence database for CATH domains (clustered at different levels of sequence identity) | | ||
- | |Representatives |List of CATH representative domains (representing clusters at different levels of sequence identity) | | ||