User Tools

Site Tools


data:cathdomainlist

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

data:cathdomainlist [2008/09/08 16:46] – created sillitoedata:cathdomainlist [2008/09/08 16:50] (current) sillitoe
Line 1: Line 1:
 +
 +====== CATH List File (CLF) ======
 +
 +===== Format 2.0 =====
 +
 +This file format has an entry for each structural entry in CATH.
 +
 +^ Column  ^ Description  ^
 +| 1  | CATH domain name (seven characters)  |
 +| 2  | Class number  |
 +| 3  | Architecture number  |
 +| 4  | Topology number  |
 +| 5  | Homologous superfamily number  |
 +| 6  | S35 sequence cluster number  |
 +| 7  | S60 sequence cluster number  |
 +| 8  | S95 sequence cluster number  |
 +| 9  | S100 sequence cluster number  |
 +| 10  | S100 sequence count number  |
 +| 11  | Domain length  |
 +| 12  | Structure resolution (Angstroms) \\ (999.000 for NMR structures and 1000.000 for obsolete PDB entries)   |
 +
 +Comment lines start with a '#' character.
 +
 +
 +==== Example ====
 +
 +<code>
 +1oaiA00        10        10                        59 1.000
 +1go5A00        10        10                        69 999.000
 +1oksA00        10        10                        51 1.800
 +1t6oA00        10        10                        49 2.000
 +1cuk003        10        10                        48 1.900
 +1hjp003        10        10                        44 2.500
 +1c7yA03        10        10                        48 3.100
 +1p3qQ00        10        10                        43 1.700
 +1mn3A00        10        10                        52 2.300
 +1nv8B01        10        10                        71 2.200
 +</code>
 +
 +==== CATH Domain Names ====
 +
 +The domain names have seven characters (e.g. 1oaiA00).
 +
 +^ Characters  ^ Description  ^
 +| 1-4  | PDB Code  \\ The first 4 characters determine the PDB code e.g. 1oai  |
 +| 5    | Chain Character \\ This determines which PDB chain is represented.   |
 +| 6-7  | Domain Number \\ The domain number is a 2-figure, zero-padded number (e.g. '01', '02' ... '10', '11', '12'). Where the domain number is a double ZERO ('00') this indicates that the domain is a whole PDB chain with no domain chopping.  |
 +
 +
 +==== Hierachy Node Representatives ====
 +
 +Representative structural domains are selected from the CathDomainList based on
 +the numbering scheme. For example the S35 sequence family representatives
 +for superfamily 1.10.8.10 in the above example are 1oaiA00, 1oksA00, 1cuk003,
 +1p3qQ00 and 1nv8B01 as these are the first instances in the file with the same
 +superfamily number i.e. 1.10.8.10 but all have different S35 numbers (1 to 5).
 +