This is an old revision of the document!
BADASP
BADASP can produce different kind of measure:
- bad: similar the Type II of functional divergence. The threshold to choose depend if we want to be stringeant (i.e. BAD > 4) or more relaxed (BAD > 2).
- badn = BADN variant of BAD: similar the Type I of functional divergence, between two groups.
- badx = BADX variant of BAD: similar the Type II of functional divergence, between many groups.
- ssc = Livingstone & Barton method (SSC) ⇒ doesn't use ancestral reconstruction. Was developed prior to BAD.
- pdad = Property Difference After Duplication (PDAD) method
- eta = Basic Evolutionary Trace Analysis (ETA) ⇒ Strictly conserved residues = 1, else = 0.
- etaq = Quantitative variant of ETA
All these methods are described in details in the manual, chapter 3.1: Functional Specificity Prediction.
Installation
Download the badasp archive and unzip it: http://www.southampton.ac.uk/~re1u06/software/badasp/index.html
wget http://www.southampton.ac.uk/~re1u06/software/downloads/badasp.zip unzip badasp.zip
Analysis of the V-type proton ATPase 116 kDa subunit a gene family
We want to identify the residues making differences between the isoforms 1 and isoforms 4 of the V-type proton ATPase 116 kDa subunit a.
cd ./badasp # Folder of installation
Execute badasp by importing the multiple alignment in FASTA format (“badasp_eg.fas”) and activating the interactive mode (i=1):
python badasp.py seqin=badasp_eg.fas i=1
Badasp will ask for the associated tree, in newick format (“badasp_eg.nsf”):
Looking for treefile badasp_eg.nsf. Tree: ['seqin=badasp_eg.fas', 'i=1', 'nsfin=badasp_eg.nsf'] <ENTER> to continue => nsfin=badasp_eg.nsf => Press enter Display Tree, with two groups of sequences: V-type proton ATPase 116 kDa subunit a - VPP1 = VPP Isoform 1 (8 genes) - NVL = VPP Isoform 4 (3 genes) Rooted Tree (1000 bootstraps). Branch Lengths given. 21 nodes. <ENTER> to continue. => Press enter Tree is rooted at node 21 => perfect => Press 0, then enter. *** Tree Menu *** Sequence Data are already imported => we quit the menu. Choice [default=Q]: q Quit Tree Menu? (y/n) [default=Y]: y
The tree is now loaded and we need to define the two groups to analyse:
#*# Grouping Summary #*# Currently 0 groups. (11 Orphans) => Press enter # We need to split the tree on the node 21, so we need to define two groups from the children nodes 20 (= VPP1 subfamily) and 19 (= VPP4 subfamily) . => Press M, then enter. # Manual grouping (Tree displayed) Choice? [default=Q]: c # We collapse node Node [default=0]: 20 => Type VPP1, then Press enter Choice? [default=Q]: c # We collapse node Node [default=0]: 19 => Type VPP4, then Press enter Choice? [default=Q]: Q, then enter # We collapse node Quit Tree Edit? (y/n) [default=Y]: y #*# Grouping Summary #*# ENTER> to continue. Choice for Grouping? [default=K]: K, then enter Keep Groups? (y/n) [default=Y]: Y, then enter Save groups? (y/n) [default=Y]: y Name of Groupfile? [default=badasp_eg.grp]: enter Write Group Names? (y/n) [default=N]: N Use badasp_eg for output filenames? (y/n) [default=Y]: enter Use these parameters? (y/n) [default=Y]: enter
Badasp will now perform some computation. It will reconstruct the ancestral sequences at each node of the tree, using the GASP (Gapped Ancestral Sequence Prediction) method
Making Ancestral Sequences - Variable PAM Weighting Reading PAM1 matrix from jones.pam
# #Start computing Saving Ancestral Sequences in badasp_eg.anc.fas… <ENTER> to continue. Method BADX needs query but none given. Drop BADX from specificity methods? (y/n) [default=Y]: n Method BADX needs query but none given. Use sequence 1 (vpp1_HUMAN/Q8N5G7)? (y/n) [default=N]: y
Calculating ['BAD', 'BADN', 'BADX', 'SSC', 'PDAD', 'ETA', 'ETAQ'] scores… (849 residues) …win(0) <ENTER> to continue. …Done! <ENTER> to continue. …win(0) <ENTER> to continue. # (many times !) </code>
Now, Badasp will ask you the kind of output you want. Let's say yes to everything.
Output additional, filtered results? (y/n) [default=N]: y Name for partial results file? [default=badasp_eg.partial.badasp]: enter Output subfam 1 (VPP4) details (pos,aa & win)? (y/n) [default=Y]: y Output subfam 2 (VPP1) details (pos,aa & win)? (y/n) [default=Y]: y Output BAD results? (y/n) [default=Y]: Output BADN results? (y/n) [default=Y]: y Output BADX results? (y/n) [default=Y]: y Output SSC results? (y/n) [default=Y]: y Output PDAD results? (y/n) [default=Y]: y Output ETA results? (y/n) [default=Y]: y Output ETAQ results? (y/n) [default=Y]: y Output Info results? (y/n) [default=Y]: y Output PCon_Abs results? (y/n) [default=Y]: y Output PCon_Mean results? (y/n) [default=Y]: y Output QPCon_Mean results? (y/n) [default=Y]: y Output QPCon_Abs results? (y/n) [default=Y]: y Filter Rows by Results VALUES? (y/n) [default=Y]: y Min. value for BAD? [default=-6.708333]: => New value = "-6.708333"? (y/n) [default=Y]: Min. value for BADN? [default=-6.708333]: => New value = "-6.708333"? (y/n) [default=Y]: Min. value for BADX? [default=-3.500000]: => New value = "-3.500000"? (y/n) [default=Y]: Min. value for SSC? [default=0.000000]: => New value = "0.000000"? (y/n) [default=Y]: Min. value for PDAD? [default=-0.297619]: => New value = "-0.297619"? (y/n) [default=Y]: Min. value for ETA? [default=0.000000]: => New value = "0.000000"? (y/n) [default=Y]: Min. value for ETAQ? [default=0.000000]: => New value = "0.000000"? (y/n) [default=Y]: Min. value for Info? [default=0.424111]: => New value = "0.424111"? (y/n) [default=Y]: Min. value for PCon_Abs? [default=1.000000]: => New value = "1.000000"? (y/n) [default=Y]: Min. value for PCon_Mean? [default=5.000000]: => New value = "5.000000"? (y/n) [default=Y]: Min. value for QPCon_Mean? [default=9.375000]: => New value = "9.375000"? (y/n) [default=Y]: Min. value for QPCon_Abs? [default=0.000000]: => New value = "0.000000"? (y/n) [default=Y]: BADASP Partial Results Output (badasp_eg.partial.badasp) ... Done! #LOG 00:23:06 BADASP V:1.3 End: Thu Sep 6 13:59:24 2012
Analysis
Open the file in your spreadsheet (or cut&space).
The columns are separated by a tab.
Color the “BAD”, “BADN” and “BAD” columns with a conditional formating, with value > 3.
In Jalview:
Load multiple alignment: badasp_eg.fas
Load tree: badasp_eg.nsf
Put a vertical line a the root of the tree to split the tree in two.
Some sites are interesting, i.e.:
- Positon 3 BAD
- Position 762 BAD
- Position 223 BADX
There are only three genes in the group de VPP4, that explains why the BADX score are very close to the BAD score.