This file format has an entry for each structural entry in CATH.
Column | Description |
---|---|
1 | CATH domain name (seven characters) |
2 | Class number |
3 | Architecture number |
4 | Topology number |
5 | Homologous superfamily number |
6 | S35 sequence cluster number |
7 | S60 sequence cluster number |
8 | S95 sequence cluster number |
9 | S100 sequence cluster number |
10 | S100 sequence count number |
11 | Domain length |
12 | Structure resolution (Angstroms) (999.000 for NMR structures and 1000.000 for obsolete PDB entries) |
Comment lines start with a '#' character.
1oaiA00 1 10 8 10 1 1 1 1 1 59 1.000 1go5A00 1 10 8 10 1 1 1 1 2 69 999.000 1oksA00 1 10 8 10 2 1 1 1 1 51 1.800 1t6oA00 1 10 8 10 2 1 2 1 1 49 2.000 1cuk003 1 10 8 10 3 1 1 1 1 48 1.900 1hjp003 1 10 8 10 3 1 1 2 1 44 2.500 1c7yA03 1 10 8 10 3 1 1 2 2 48 3.100 1p3qQ00 1 10 8 10 4 1 1 1 1 43 1.700 1mn3A00 1 10 8 10 4 1 2 1 1 52 2.300 1nv8B01 1 10 8 10 5 1 1 1 1 71 2.200
The domain names have seven characters (e.g. 1oaiA00).
Characters | Description |
---|---|
1-4 | PDB Code The first 4 characters determine the PDB code e.g. 1oai |
5 | Chain Character This determines which PDB chain is represented. |
6-7 | Domain Number The domain number is a 2-figure, zero-padded number (e.g. '01', '02' … '10', '11', '12'). Where the domain number is a double ZERO ('00') this indicates that the domain is a whole PDB chain with no domain chopping. |
Representative structural domains are selected from the CathDomainList based on the numbering scheme. For example the S35 sequence family representatives for superfamily 1.10.8.10 in the above example are 1oaiA00, 1oksA00, 1cuk003, 1p3qQ00 and 1nv8B01 as these are the first instances in the file with the same superfamily number i.e. 1.10.8.10 but all have different S35 numbers (1 to 5).