The testing plugin is enabled and should be disabled.
Differences
This shows you the differences between two versions of the page.
data:index [2017/10/11 12:37] sillitoe |
data:index [2017/10/14 15:28] (current) sillitoe |
||
---|---|---|---|
Line 2: | Line 2: | ||
====== CATH Data Downloads ====== | ====== CATH Data Downloads ====== | ||
- | This page provides information on the data files that are available to download from the CATH FTP site: | + | This page provides information on the data files that are available to download from the [[ftp://orengoftp.biochem.ucl.ac.uk/cath | CATH FTP site]]. |
- | [[ftp://orengoftp.biochem.ucl.ac.uk/cath]] | + | See [[:index#cath_releases|CATH Releases]] for more information on CATH and CATH-Plus. |
- | + | ===== CATH (daily snapshot) ===== | |
- | ===== CATH (daily) ===== | + | |
- | + | ||
- | We provide a daily snapshot of our very latest classifications and annotations as they happen in our pipeline. This enables users to find the most up-to-date information about their particular structure of interest. The amount of data we provide at this stage is limited mainly to domain boundaries and superfamily classification. | + | |
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/ | ||
Line 20: | Line 17: | ||
| cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives | | | cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives | | ||
- | ===== CATH-Plus ===== | + | ===== CATH-Plus (full release) ===== |
- | CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks (e.g. looking for evidence that would support merging superfamilies, checking for errors, etc) in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may not contain information on the most recent structures. | + | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/ |
- | For information on the statistics from specific releases, see [[../release_notes|release notes]]. | + | For information on the statistics for specific releases, see [[../release_notes|release notes]]. |
- | ==== Data related to the CATH classification ==== | + | ==== CATH classification data ==== |
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/ | ||
Line 40: | Line 37: | ||
| cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed | | | cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed | | ||
- | ===== Data related to non-redundant data sets ===== | + | ==== Non-redundant data sets ==== |
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/ | ||
^ File name ^ Description ^ | ^ File name ^ Description ^ | ||
- | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | | + | | cath-dataset-nonredundant-S[20%%|%%40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | |
- | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.fa | The sequences of the domains in the dataset | | + | | cath-dataset-nonredundant-S[20%%|%%40].fa | The sequences of the domains in the dataset | |
- | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.list | A list of the domains in the dataset; one domain ID per line | | + | | cath-dataset-nonredundant-S[20%%|%%40].list | A list of the domains in the dataset; one domain ID per line | |
- | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set | | + | | cath-dataset-nonredundant-S[20%%|%%40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set | |
- | ===== Data related to sequence data ===== | + | ==== Sequence data ==== |
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/ | ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/ |