The testing plugin is enabled and should be disabled.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

data:index [2017/10/11 12:40]
sillitoe
data:index [2017/10/14 15:28] (current)
sillitoe
Line 2: Line 2:
 ====== CATH Data Downloads ====== ====== CATH Data Downloads ======
  
-This page provides information on the data files that are available to download from the CATH FTP site.+This page provides information on the data files that are available to download from the [[ftp://orengoftp.biochem.ucl.ac.uk/cath | CATH FTP site]].
  
-[[ftp://orengoftp.biochem.ucl.ac.uk/cath]]+See [[:index#cath_releases|CATH Releases]] for more information on CATH and CATH-Plus.
  
 ===== CATH (daily snapshot) ===== ===== CATH (daily snapshot) =====
- 
-We provide a daily snapshot of the very latest classifications and annotations as they happen in our pipeline. This enables users to find the most up-to-date information about their particular structure of interest. The amount of data we provide at this stage is limited mainly to domain boundaries and superfamily classification. 
  
 ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/ ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/
Line 19: Line 17:
 | cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives |  | cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives | 
  
-===== CATH-Plus Release =====+===== CATH-Plus (full release) =====
  
 ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/ ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/
  
-CATH-Plus adds a significant amount of data on top of the core classification information available in CATH. The CATH-Plus release process includes a number of manual annotation checks in addition to adding a huge amount of information combining protein structure, sequence and function. As a result, there is a greater depth of information available in CATH-Plus, though it may be missing information on the most recent structures.  +For information on the statistics for specific releases, see [[../release_notes|release notes]].
- +
-For information on the statistics from specific releases, see [[../release_notes|release notes]].+
  
-==== Data related to the CATH classification ====+==== CATH classification data ====
  
 ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/ ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/
Line 41: Line 37:
 | cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed | | cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed |
  
-===== Data related to non-redundant data sets =====+==== Non-redundant data sets ====
  
 ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/ ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/
  
 ^ File name ^ Description ^ ^ File name ^ Description ^
-| cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | +| cath-dataset-nonredundant-S[20%%|%%40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) | 
-| cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.fa | The sequences of the domains in the dataset | +| cath-dataset-nonredundant-S[20%%|%%40].fa | The sequences of the domains in the dataset | 
-| cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.list | A list of the domains in the dataset; one domain ID per line | +| cath-dataset-nonredundant-S[20%%|%%40].list | A list of the domains in the dataset; one domain ID per line | 
-| cath-dataset-nonredundant-S[20%%|%%40]-v4_1_0.pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set |+| cath-dataset-nonredundant-S[20%%|%%40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set |
  
-===== Data related to sequence data =====+==== Sequence data ====
  
 ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/ ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/
Print/export