This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| data:index [2017/10/11 12:44] – sillitoe | data:index [2017/10/14 15:28] (current) – sillitoe | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| ====== CATH Data Downloads ====== | ====== CATH Data Downloads ====== | ||
| - | This page provides information on the data files that are available to download from the CATH FTP site. | + | This page provides information on the data files that are available to download from the [[ftp:// |
| - | [[ftp:// | + | See [[:index# |
| ===== CATH (daily snapshot) ===== | ===== CATH (daily snapshot) ===== | ||
| - | |||
| - | We provide a daily snapshot of the very latest classifications and annotations as they happen in our pipeline. This enables users to find the most up-to-date information about their particular structure of interest. The amount of data we provide at this stage is limited mainly to domain boundaries and superfamily classification. | ||
| ftp:// | ftp:// | ||
| Line 22: | Line 20: | ||
| ftp:// | ftp:// | ||
| - | |||
| - | CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may be missing information on the most recent structures. | ||
| For information on the statistics for specific releases, see [[../ | For information on the statistics for specific releases, see [[../ | ||
| Line 46: | Line 42: | ||
| ^ File name ^ Description ^ | ^ File name ^ Description ^ | ||
| - | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | | + | | cath-dataset-nonredundant-S[20%%|%%40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | |
| - | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.fa | The sequences of the domains in the dataset | | + | | cath-dataset-nonredundant-S[20%%|%%40].fa | The sequences of the domains in the dataset | |
| - | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.list | A list of the domains in the dataset; one domain ID per line | | + | | cath-dataset-nonredundant-S[20%%|%%40].list | A list of the domains in the dataset; one domain ID per line | |
| - | | cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set | | + | | cath-dataset-nonredundant-S[20%%|%%40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set | |
| ==== Sequence data ==== | ==== Sequence data ==== | ||