Cath Domain Description File (CDDF)
Format 1.0
Each entry corresponds to a CATH domain for a given release of the CATH database. Note: Different releases of CATH may have different domain definitions. See below for CATH domain and segment naming conventions.
Information is compiled from the following files:
- CathDomain Fasta Sequences
- CathSegment Fasta Sequences
- PdbSum file (in CDDF format)
- ChainLimits file (in CDDF format)
Comment lines start with a '#' character.
MAXIMUM of 80 characters per line (composed of a tag that is always a maximum of 10 characters; the rest of the line should be no longer than 70 characters).
Tags | Description |
---|---|
FORMAT | Format definition (CDDF1.0) and first line of each entry |
DOMAIN | CATH domain identifier - six character code (e.g. 1abc01) |
PDBID | PDB identifier - four character code (currently only used in PdbSumData files) |
VERSION | CATH version number |
VERDATE | CATH version release date |
NAME | PDB entry description |
SOURCE | PDB entry organism/source |
CATHCODE | CATH superfamily code C.A.T.H e.g. 1.10.10.10 |
CLASS | Text description of class level (default: 'void') |
ARCH | Text description of architecture level (default: 'void') |
TOPOL | Text description of topology level (default: 'void') |
HOMOL | Text description of homologous superfamily level (default: 'void') |
DLENGTH | Length of the domain sequence |
DSEQH | Domain sequence header in FASTA format (e.g. '>pdb|1abc01') |
DSEQS | Domain sequence string in FASTA format |
NSEGMENTS | Number of segments that comprise the domain (integer) |
SEGMENT | Segment identifier (e.g. 1abc01:1:2) |
SRANGE | Start and stop PDB residue identifiers that define the range of segment (e.g. START=159 STOP=202) |
SLENGTH | Length of the segment sequence |
SSEQH | Segment sequence header in FASTA format (e.g. '>pdb|1abc01:1:2') |
SSEQS | Segment sequence string in FASTA format |
ENDSEG | Signifies end of segment entry |
COMMENTS | Text |
// | Signifies end of entry |
Example
FORMAT CDDF1.0 DOMAIN 9lprA1 VERSION 2.4 VERDATE 14-Jan-2002 NAME Alpha-lytic protease complex with methoxysuccinyl- Ala- Ala- Pro- Leuc NAME ine boronic acid SOURCE (Lysobacter enzymogenes 495) cloned and expressed in (escherichia coli SOURCE ) CATHCODE 2.40.10.10 CLASS Mainly Beta ARCH Barrel TOPOL Thrombin, subunit H HOMOL Trypsin-like serine proteases DLENGTH 87 DSEQH >pdb|9lprA1 DSEQS IVGGIEYSINNASLCSVGFSVTRGATKGFVTAGHCGTVNATARIGGAVVGTFAARVFPGNDRAWVSLTSA DSEQS QTLLLQPILSQYGLSLV NSEGMENTS 2 SEGMENT 9lprA1:1:2 SRANGE START=16 STOP=115 SLENGTH 74 SSEQH >pdb|9lprA1:1:2 SSEQS IVGGIEYSINNASLCSVGFSVTRGATKGFVTAGHCGTVNATARIGGAVVGTFAARVFPGNDRAWVSLTSA SSEQS QTLL ENDSEG SEGMENT 9lprA1:2:2 SRANGE START=231 STOP=242 SLENGTH 13 SSEQH >pdb|9lprA1:2:2 SSEQS LQPILSQYGLSLV ENDSEG COMMENTS Blah Blah //
NOTE: The following CATH hierarchy description lines are typically found together (derived from CathList and CathNames files)
CATHCODE 2.40.10.10 CLASS Mainly Beta ARCH Barrel TOPOL Thrombin, subunit H HOMOL Trypsin-like serine proteases
The following domain sequence lines are typically found together (derived from CathDomain Fasta Sequence File)
DLENGTH 87 DSEQH >pdb|9lprA1 DSEQS IVGGIEYSINNASLCSVGFSVTRGATKGFVTAGHCGTVNATARIGGAVVGTFAARVFPGNDRAWVSLTSA DSEQS QTLLLQPILSQYGLSLV
Segment sequence lines are always initiated with a 'SEGMENT' tag and terminated with an 'ENDSEG' tag. The number of segments in the domain always precedes the first segment using the 'NSEGMENTS' tag.
The following segment sequence lines are typically found together (derived from CathSegments Fasta Sequence File)
SEGMENT 9lprA1:1:2 SRANGE START=16 STOP=115 SLENGTH 74 SSEQH >pdb|9lprA1:1:2 SSEQS IVGGIEYSINNASLCSVGFSVTRGATKGFVTAGHCGTVNATARIGGAVVGTFAARVFPGNDRAWVSLTSA SSEQS QTLL ENDSEG
Cath Domain and Segment Naming Conventions
CATH Domain Names
The domain names have seven characters (e.g. 1oaiA00).
CHARACTERS 1-4: PDB Code The first 4 characters determine the PDB code e.g. 1oai CHARACTER 5: Chain Character This determines which PDB chain is represented. Chain characters of zero ('0') indicate that the PDB file has no chain field. CHARACTER 6-7: Domain Number The domain number is a 2-figure, zero-padded number (e.g. '01', '02' ... '10', '11', '12'). Where the domain number is a double ZERO ('00') this indicates that the domain is a whole PDB chain with no domain chopping.
CATH Segment Names
CATH segments (continuous regions of sequence within a domain) are described adding colon separated numbers to the end of the domain name.
The first number is the sequential number of the segment.
The second number is the total number of segments in this domain.
1abcA01:1:2 xxxxxxxooooo x = standard CATH six character domain name o = segment information :ThisSegment:TotalSegments