This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| data:cathdomaindescriptionfile [2008/09/16 07:24] – garner | data:cathdomaindescriptionfile [2014/11/07 14:56] (current) – sillitoe | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Cath Domain Description File (CDDF) ====== | ||
| + | |||
| + | ===== Format 1.0 ===== | ||
| + | |||
| + | Each entry corresponds to a CATH domain for a given release of the CATH database. | ||
| + | Note: Different releases of CATH may have different domain definitions. | ||
| + | See below for CATH domain and segment naming conventions. | ||
| + | |||
| + | Information is compiled from the following files: | ||
| + | * [[CathDomainList]] | ||
| + | * [[CathDomall]] | ||
| + | * CathDomain Fasta Sequences | ||
| + | * CathSegment Fasta Sequences | ||
| + | * PdbSum file (in CDDF format) | ||
| + | * ChainLimits file (in CDDF format) | ||
| + | |||
| + | Comment lines start with a '#' | ||
| + | |||
| + | MAXIMUM of 80 characters per line (composed of a tag that is always a maximum of | ||
| + | 10 characters; the rest of the line should be no longer than 70 characters). | ||
| + | |||
| + | ^ Tags ^ Description | ||
| + | | FORMAT | ||
| + | | DOMAIN | ||
| + | | PDBID | ||
| + | | VERSION | ||
| + | | VERDATE | ||
| + | | NAME | PDB entry description | ||
| + | | SOURCE | ||
| + | | CATHCODE | CATH superfamily code C.A.T.H e.g. 1.10.10.10 | ||
| + | | CLASS | ||
| + | | ARCH | Text description of architecture level (default: ' | ||
| + | | TOPOL | ||
| + | | HOMOL | ||
| + | | DLENGTH | ||
| + | | DSEQH | ||
| + | | DSEQS | ||
| + | | NSEGMENTS | Number of segments that comprise the domain (integer) | ||
| + | | SEGMENT | ||
| + | | SRANGE | ||
| + | | SLENGTH | ||
| + | | SSEQH | ||
| + | | SSEQS | ||
| + | | ENDSEG | ||
| + | | COMMENTS | ||
| + | | %%// | ||
| + | |||
| + | |||
| + | ==== Example ==== | ||
| + | |||
| + | < | ||
| + | FORMAT | ||
| + | DOMAIN | ||
| + | VERSION | ||
| + | VERDATE | ||
| + | NAME Alpha-lytic protease complex with methoxysuccinyl- Ala- Ala- Pro- Leuc | ||
| + | NAME ine boronic acid | ||
| + | SOURCE | ||
| + | SOURCE | ||
| + | CATHCODE | ||
| + | CLASS | ||
| + | ARCH Barrel | ||
| + | TOPOL | ||
| + | HOMOL | ||
| + | DLENGTH | ||
| + | DSEQH > | ||
| + | DSEQS | ||
| + | DSEQS | ||
| + | NSEGMENTS 2 | ||
| + | SEGMENT | ||
| + | SRANGE | ||
| + | SLENGTH | ||
| + | SSEQH > | ||
| + | SSEQS | ||
| + | SSEQS QTLL | ||
| + | ENDSEG | ||
| + | SEGMENT | ||
| + | SRANGE | ||
| + | SLENGTH | ||
| + | SSEQH > | ||
| + | SSEQS | ||
| + | ENDSEG | ||
| + | COMMENTS | ||
| + | // | ||
| + | </ | ||
| + | |||
| + | |||
| + | NOTE: | ||
| + | The following CATH hierarchy description lines are typically found together | ||
| + | (derived from CathList and CathNames files) | ||
| + | |||
| + | < | ||
| + | CATHCODE | ||
| + | CLASS | ||
| + | ARCH Barrel | ||
| + | TOPOL | ||
| + | HOMOL | ||
| + | </ | ||
| + | |||
| + | |||
| + | The following domain sequence lines are typically found together | ||
| + | (derived from CathDomain Fasta Sequence File) | ||
| + | |||
| + | < | ||
| + | DLENGTH | ||
| + | DSEQH > | ||
| + | DSEQS | ||
| + | DSEQS | ||
| + | </ | ||
| + | |||
| + | Segment sequence lines are always initiated with a ' | ||
| + | and terminated with an ' | ||
| + | always precedes the first segment using the ' | ||
| + | |||
| + | The following segment sequence lines are typically found together | ||
| + | (derived from CathSegments Fasta Sequence File) | ||
| + | |||
| + | |||
| + | < | ||
| + | SEGMENT | ||
| + | SRANGE | ||
| + | SLENGTH | ||
| + | SSEQH > | ||
| + | SSEQS | ||
| + | SSEQS QTLL | ||
| + | ENDSEG | ||
| + | </ | ||
| + | |||
| + | ===== Cath Domain and Segment Naming Conventions ===== | ||
| + | |||
| + | ==== CATH Domain Names ==== | ||
| + | |||
| + | The domain names have seven characters (e.g. 1oaiA00). | ||
| + | |||
| + | < | ||
| + | CHARACTERS 1-4: PDB Code | ||
| + | The first 4 characters determine the PDB code e.g. 1oai | ||
| + | |||
| + | CHARACTER 5: Chain Character | ||
| + | This determines which PDB chain is represented. | ||
| + | Chain characters of zero (' | ||
| + | |||
| + | CHARACTER 6-7: Domain Number | ||
| + | The domain number is a 2-figure, zero-padded number (e.g. ' | ||
| + | </ | ||
| + | |||
| + | ==== CATH Segment Names ==== | ||
| + | |||
| + | CATH segments (continuous regions of sequence within a domain) are described | ||
| + | adding colon separated numbers to the end of the domain name. | ||
| + | |||
| + | The first number is the sequential number of the segment. | ||
| + | |||
| + | The second number is the total number of segments in this domain. | ||
| + | |||
| + | < | ||
| + | 1abcA01:1:2 | ||
| + | xxxxxxxooooo | ||
| + | |||
| + | x = standard CATH six character domain name | ||
| + | o = segment information : | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||