The testing plugin is enabled and should be disabled.
Differences
This shows you the differences between two versions of the page.
tutorials:eccb_t2_badasp [2012/09/06 16:27] romainstuder created |
tutorials:eccb_t2_badasp [2012/09/08 16:15] (current) romainstuder |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ==== BADASP ==== | ||
- | cd /home/bsm4/rstuder/Dropbox/ECCB2012/Tutorial/badasp # Folder of installation | + | BADASP can produce different measures: |
- | # Run badasp | + | * bad: similar the **Type II** of functional divergence. The threshold to choose depend if we want to be stringeant (i.e. BAD > 4) or more relaxed (BAD > 2). |
- | <code>python badasp.py seqin=badasp_eg.fas i=1</code> | + | * badn = BADN variant of BAD: similar the **Type I** of functional divergence, between __two__ groups. |
+ | * badx = BADX variant of BAD: similar the **Type II** of functional divergence, between __many__ groups. | ||
+ | * ssc = Livingstone & Barton method (SSC) => doesn't use ancestral reconstruction. Was developed prior to BAD. | ||
+ | * pdad = Property Difference After Duplication (PDAD) method | ||
+ | * eta = Basic Evolutionary Trace Analysis (ETA) => Strictly conserved residues = 1, else = 0. | ||
+ | * etaq = Quantitative variant of ETA | ||
+ | |||
+ | All these methods are described in details in the manual, **chapter 3.1: Functional Specificity Prediction**. | ||
+ | |||
+ | === Installation === | ||
+ | |||
+ | Download the badasp archive and unzip it: | ||
+ | [[http://www.southampton.ac.uk/~re1u06/software/badasp/index.html]] | ||
+ | <code> | ||
+ | wget http://www.southampton.ac.uk/~re1u06/software/downloads/badasp.zip | ||
+ | unzip badasp.zip | ||
+ | </code> | ||
+ | |||
+ | === Analysis of the V-type proton ATPase 116 kDa subunit a gene family === | ||
+ | |||
+ | We want to identify the residues making differences between the **isoforms 1** and **isoforms 4** of the V-type proton ATPase 116 kDa subunit a. | ||
+ | |||
+ | First, visualise briefly the multiple alignment in Jalview. (File "badasp_eg.fas" in the badasp folder. | ||
+ | |||
+ | |||
+ | Execute **badasp** by importing the multiple alignment in FASTA format ("badasp_eg.fas") and activating the interactive mode (i=1): | ||
+ | |||
+ | <code> | ||
+ | cd ./badasp # Folder of installation | ||
+ | python badasp.py seqin=badasp_eg.fas i=1</code> | ||
+ | |||
+ | Badasp will ask for the associated tree, in newick format ("badasp_eg.nsf"): | ||
<code> | <code> | ||
- | # Ask for a tree | ||
Looking for treefile badasp_eg.nsf. | Looking for treefile badasp_eg.nsf. | ||
Tree: ['seqin=badasp_eg.fas', 'i=1', 'nsfin=badasp_eg.nsf'] <ENTER> to continue | Tree: ['seqin=badasp_eg.fas', 'i=1', 'nsfin=badasp_eg.nsf'] <ENTER> to continue | ||
Line 12: | Line 43: | ||
=> Press enter | => Press enter | ||
+ | </code> | ||
Display Tree, with two groups of sequences: | Display Tree, with two groups of sequences: | ||
V-type proton ATPase 116 kDa subunit a | V-type proton ATPase 116 kDa subunit a | ||
- | - VPP1 = VPP Isoform 1 (8 genes) | + | * VPP1 = VPP Isoform 1 (8 genes) |
- | - NVL = VPP Isoform 4 (3 genes) | + | * NVL = VPP Isoform 4 (3 genes) |
+ | <code> | ||
Rooted Tree (1000 bootstraps). Branch Lengths given. 21 nodes. <ENTER> to continue. | Rooted Tree (1000 bootstraps). Branch Lengths given. 21 nodes. <ENTER> to continue. | ||
=> Press enter | => Press enter | ||
- | |||
Tree is rooted at node 21 => perfect | Tree is rooted at node 21 => perfect | ||
Line 31: | Line 61: | ||
Choice [default=Q]: q | Choice [default=Q]: q | ||
Quit Tree Menu? (y/n) [default=Y]: y | Quit Tree Menu? (y/n) [default=Y]: y | ||
+ | </code> | ||
+ | The tree is now loaded and we need to define the two groups to analyse: | ||
+ | |||
+ | <code> | ||
#*# Grouping Summary #*# | #*# Grouping Summary #*# | ||
Line 37: | Line 71: | ||
=> Press enter | => Press enter | ||
- | We need to split the tree on the node 21, so we need to define two groups from the children nodes 20 and 19. | + | # We need to split the tree on the node 21, |
+ | # so we need to define two groups from the children nodes 20 (= VPP1 subfamily) and 19 (= VPP4 subfamily) . | ||
=> Press M, then enter. # Manual grouping | => Press M, then enter. # Manual grouping | ||
(Tree displayed) | (Tree displayed) | ||
- | Choice? [default=Q]: c # We collapse node | + | Choice? [default=Q]: c # We collapse nodes |
Node [default=0]: 20 | Node [default=0]: 20 | ||
=> Type VPP1, then Press enter | => Type VPP1, then Press enter | ||
- | Choice? [default=Q]: c # We collapse node | + | Choice? [default=Q]: c # We collapse nodes |
Node [default=0]: 19 | Node [default=0]: 19 | ||
=> Type VPP4, then Press enter | => Type VPP4, then Press enter | ||
Line 60: | Line 95: | ||
Use badasp_eg for output filenames? (y/n) [default=Y]: enter | Use badasp_eg for output filenames? (y/n) [default=Y]: enter | ||
Use these parameters? (y/n) [default=Y]: enter | Use these parameters? (y/n) [default=Y]: enter | ||
+ | </code> | ||
+ | Badasp will now perform some computations. It will reconstruct the ancestral sequences at each node of the tree, using GASP (ref: http:dx.doi.org/10.1186/1471-2105-5-123 ) | ||
+ | <code> | ||
Making Ancestral Sequences - Variable PAM Weighting | Making Ancestral Sequences - Variable PAM Weighting | ||
Reading PAM1 matrix from jones.pam | Reading PAM1 matrix from jones.pam | ||
Line 73: | Line 111: | ||
...Done! <ENTER> to continue. | ...Done! <ENTER> to continue. | ||
...win(0) <ENTER> to continue. # (many times !) | ...win(0) <ENTER> to continue. # (many times !) | ||
+ | </code> | ||
+ | |||
+ | Now, Badasp will ask you the kind of output you want. | ||
+ | Let's say yes to everything. | ||
+ | |||
+ | <code> | ||
Output additional, filtered results? (y/n) [default=N]: y | Output additional, filtered results? (y/n) [default=N]: y | ||
Name for partial results file? [default=badasp_eg.partial.badasp]: enter | Name for partial results file? [default=badasp_eg.partial.badasp]: enter | ||
Line 84: | Line 128: | ||
Output PDAD results? (y/n) [default=Y]: y | Output PDAD results? (y/n) [default=Y]: y | ||
Output ETA results? (y/n) [default=Y]: y | Output ETA results? (y/n) [default=Y]: y | ||
- | |||
Output ETAQ results? (y/n) [default=Y]: y | Output ETAQ results? (y/n) [default=Y]: y | ||
- | |||
Output Info results? (y/n) [default=Y]: y | Output Info results? (y/n) [default=Y]: y | ||
- | |||
Output PCon_Abs results? (y/n) [default=Y]: y | Output PCon_Abs results? (y/n) [default=Y]: y | ||
- | |||
Output PCon_Mean results? (y/n) [default=Y]: y | Output PCon_Mean results? (y/n) [default=Y]: y | ||
- | |||
Output QPCon_Mean results? (y/n) [default=Y]: y | Output QPCon_Mean results? (y/n) [default=Y]: y | ||
- | |||
Output QPCon_Abs results? (y/n) [default=Y]: y | Output QPCon_Abs results? (y/n) [default=Y]: y | ||
- | |||
Filter Rows by Results VALUES? (y/n) [default=Y]: y | Filter Rows by Results VALUES? (y/n) [default=Y]: y | ||
- | |||
Min. value for BAD? [default=-6.708333]: | Min. value for BAD? [default=-6.708333]: | ||
- | |||
=> New value = "-6.708333"? (y/n) [default=Y]: | => New value = "-6.708333"? (y/n) [default=Y]: | ||
- | |||
Min. value for BADN? [default=-6.708333]: | Min. value for BADN? [default=-6.708333]: | ||
- | |||
=> New value = "-6.708333"? (y/n) [default=Y]: | => New value = "-6.708333"? (y/n) [default=Y]: | ||
- | |||
Min. value for BADX? [default=-3.500000]: | Min. value for BADX? [default=-3.500000]: | ||
- | |||
=> New value = "-3.500000"? (y/n) [default=Y]: | => New value = "-3.500000"? (y/n) [default=Y]: | ||
- | |||
Min. value for SSC? [default=0.000000]: | Min. value for SSC? [default=0.000000]: | ||
- | |||
=> New value = "0.000000"? (y/n) [default=Y]: | => New value = "0.000000"? (y/n) [default=Y]: | ||
- | |||
Min. value for PDAD? [default=-0.297619]: | Min. value for PDAD? [default=-0.297619]: | ||
- | |||
=> New value = "-0.297619"? (y/n) [default=Y]: | => New value = "-0.297619"? (y/n) [default=Y]: | ||
- | + | Min. value for ETA? [default=0.000000]: | |
- | Min. value for ETA? [default=0.000000]: | + | |
=> New value = "0.000000"? (y/n) [default=Y]: | => New value = "0.000000"? (y/n) [default=Y]: | ||
- | + | Min. value for ETAQ? [default=0.000000]: | |
- | Min. value for ETAQ? [default=0.000000]: | + | |
=> New value = "0.000000"? (y/n) [default=Y]: | => New value = "0.000000"? (y/n) [default=Y]: | ||
- | |||
Min. value for Info? [default=0.424111]: | Min. value for Info? [default=0.424111]: | ||
- | |||
=> New value = "0.424111"? (y/n) [default=Y]: | => New value = "0.424111"? (y/n) [default=Y]: | ||
- | |||
Min. value for PCon_Abs? [default=1.000000]: | Min. value for PCon_Abs? [default=1.000000]: | ||
- | |||
=> New value = "1.000000"? (y/n) [default=Y]: | => New value = "1.000000"? (y/n) [default=Y]: | ||
- | |||
Min. value for PCon_Mean? [default=5.000000]: | Min. value for PCon_Mean? [default=5.000000]: | ||
- | |||
=> New value = "5.000000"? (y/n) [default=Y]: | => New value = "5.000000"? (y/n) [default=Y]: | ||
- | |||
Min. value for QPCon_Mean? [default=9.375000]: | Min. value for QPCon_Mean? [default=9.375000]: | ||
- | |||
=> New value = "9.375000"? (y/n) [default=Y]: | => New value = "9.375000"? (y/n) [default=Y]: | ||
- | |||
Min. value for QPCon_Abs? [default=0.000000]: | Min. value for QPCon_Abs? [default=0.000000]: | ||
- | |||
=> New value = "0.000000"? (y/n) [default=Y]: | => New value = "0.000000"? (y/n) [default=Y]: | ||
- | |||
BADASP Partial Results Output (badasp_eg.partial.badasp) ... Done! | BADASP Partial Results Output (badasp_eg.partial.badasp) ... Done! | ||
#LOG 00:23:06 BADASP V:1.3 End: Thu Sep 6 13:59:24 2012 | #LOG 00:23:06 BADASP V:1.3 End: Thu Sep 6 13:59:24 2012 | ||
+ | </code> | ||
- | ### Analysis | + | === Analysis === |
Open the file in your spreadsheet (or cut&space). | Open the file in your spreadsheet (or cut&space). | ||
Line 161: | Line 174: | ||
Color the "BAD", "BADN" and "BAD" columns with a conditional formating, with value > 3. | Color the "BAD", "BADN" and "BAD" columns with a conditional formating, with value > 3. | ||
- | |||
- | </code> | ||
== In Jalview: == | == In Jalview: == | ||
Line 172: | Line 183: | ||
Put a vertical line a the root of the tree to split the tree in two. | Put a vertical line a the root of the tree to split the tree in two. | ||
+ | Some sites are interesting, i.e.: | ||
+ | * Positon 3 BAD | ||
+ | * Position 762 BAD | ||
+ | * Position 223 BADX | ||
- | Positon 3 BAD | + | There are only three genes in the group de VPP4, that explains why the BADX score are very close to the BAD score. |
- | Position 762 BAD | + | |
- | Position 223 BADX | + | |