The testing plugin is enabled and should be disabled.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

tutorials:eccb_t2_badasp [2012/09/06 20:31]
romainstuder
tutorials:eccb_t2_badasp [2012/09/08 16:15] (current)
romainstuder
Line 1: Line 1:
 +==== BADASP ====
  
-cd /home/bsm4/rstuder/Dropbox/ECCB2012/Tutorial/badasp  # Folder of installation+BADASP can produce different measures:
  
-# Run badasp +   * bad: similar the **Type II** of functional divergence. The threshold to choose depend if we want to be stringeant (i.e. BAD > 4) or more relaxed (BAD > 2). 
-<code>python badasp.py seqin=badasp_eg.fas i=1</code>+   * badn = BADN variant of BAD: similar the **Type I** of functional divergence, between __two__ groups. 
 +   * badx = BADX variant of BAD: similar the **Type II** of functional divergence, between __many__ groups. 
 +   * ssc = Livingstone & Barton method (SSC) => doesn't use ancestral reconstruction. Was developed prior to BAD. 
 +   * pdad = Property Difference After Duplication (PDAD) method 
 +   * eta = Basic Evolutionary Trace Analysis (ETA) => Strictly conserved residues = 1, else = 0. 
 +   * etaq = Quantitative variant of ETA 
 + 
 +All these methods are described in details in the manual, **chapter 3.1: Functional Specificity Prediction**. 
 + 
 +=== Installation === 
 + 
 +Download the badasp archive and unzip it: 
 +[[http://www.southampton.ac.uk/~re1u06/software/badasp/index.html]] 
 +<code> 
 +wget http://www.southampton.ac.uk/~re1u06/software/downloads/badasp.zip 
 +unzip badasp.zip 
 +</code> 
 + 
 +=== Analysis of the V-type proton ATPase 116 kDa subunit a gene family === 
 + 
 +We want to identify the residues making differences between the **isoforms 1** and **isoforms 4** of the V-type proton ATPase 116 kDa subunit a. 
 + 
 +First, visualise briefly the multiple alignment in Jalview. (File "badasp_eg.fas" in the badasp folder. 
 + 
 + 
 +Execute **badasp** by importing the multiple alignment in FASTA format ("badasp_eg.fas") and activating the interactive mode (i=1): 
 + 
 +<code> 
 +cd ./badasp  # Folder of installation 
 +python badasp.py seqin=badasp_eg.fas i=1</code> 
 + 
 +Badasp will ask for the associated tree, in newick format ("badasp_eg.nsf"):
 <code> <code>
-# Ask for a tree 
 Looking for treefile badasp_eg.nsf. Looking for treefile badasp_eg.nsf.
 Tree: ['seqin=badasp_eg.fas', 'i=1', 'nsfin=badasp_eg.nsf']  <ENTER> to continue Tree: ['seqin=badasp_eg.fas', 'i=1', 'nsfin=badasp_eg.nsf']  <ENTER> to continue
Line 12: Line 43:
  
 => Press enter => Press enter
 +</code>
 Display Tree, with two groups of sequences: Display Tree, with two groups of sequences:
 V-type proton ATPase 116 kDa subunit a V-type proton ATPase 116 kDa subunit a
-VPP1 = VPP Isoform 1 (8 genes) +   * VPP1 = VPP Isoform 1 (8 genes) 
-NVL = VPP Isoform 4 (3 genes) +   * NVL = VPP Isoform 4 (3 genes) 
 +<code>
 Rooted Tree (1000 bootstraps). Branch Lengths given. 21 nodes.  <ENTER> to continue. Rooted Tree (1000 bootstraps). Branch Lengths given. 21 nodes.  <ENTER> to continue.
 => Press enter => Press enter
- 
  
 Tree is rooted at node 21 => perfect Tree is rooted at node 21 => perfect
Line 31: Line 61:
 Choice [default=Q]:  q  Choice [default=Q]:  q 
 Quit Tree Menu? (y/n) [default=Y]:  y Quit Tree Menu? (y/n) [default=Y]:  y
 +</code>
  
 +The tree is now loaded and we need to define the two groups to analyse:
 +
 +<code>
 #*# Grouping Summary #*# #*# Grouping Summary #*#
  
Line 37: Line 71:
 => Press enter => Press enter
  
-We need to split the tree on the node 21, so we need to define two groups from the children nodes 20 and 19.+We need to split the tree on the node 21, 
 +so we need to define two groups from the children nodes 20 (= VPP1 subfamily) and 19 (= VPP4 subfamily) .
 => Press M, then enter.  # Manual grouping => Press M, then enter.  # Manual grouping
 (Tree displayed) (Tree displayed)
-Choice? [default=Q]:  c  # We collapse node+Choice? [default=Q]:  c  # We collapse nodes
 Node [default=0]: 20 Node [default=0]: 20
 => Type VPP1, then Press enter => Type VPP1, then Press enter
  
-Choice? [default=Q]:  c  # We collapse node+Choice? [default=Q]:  c  # We collapse nodes
 Node [default=0]:  19 Node [default=0]:  19
 => Type VPP4, then Press enter => Type VPP4, then Press enter
Line 60: Line 95:
 Use badasp_eg for output filenames? (y/n) [default=Y]:  enter Use badasp_eg for output filenames? (y/n) [default=Y]:  enter
 Use these parameters? (y/n) [default=Y]:  enter Use these parameters? (y/n) [default=Y]:  enter
 +</code>
  
 +Badasp will now perform some computations. It will reconstruct the ancestral sequences at each node of the tree, using GASP (ref: http:dx.doi.org/10.1186/1471-2105-5-123 )
  
 +<code>
 Making Ancestral Sequences - Variable PAM Weighting Making Ancestral Sequences - Variable PAM Weighting
 Reading PAM1 matrix from jones.pam Reading PAM1 matrix from jones.pam
Line 73: Line 111:
 ...Done!  <ENTER> to continue. ...Done!  <ENTER> to continue.
 ...win(0)  <ENTER> to continue. # (many times !) ...win(0)  <ENTER> to continue. # (many times !)
 +</code>
 +
 +Now, Badasp will ask you the kind of output you want.
 +Let's say yes to everything.
 +
 +<code>
 Output additional, filtered results? (y/n) [default=N]:  y Output additional, filtered results? (y/n) [default=N]:  y
 Name for partial results file? [default=badasp_eg.partial.badasp]: enter  Name for partial results file? [default=badasp_eg.partial.badasp]: enter 
Line 121: Line 165:
 </code> </code>
  
-<code? + 
-### Analysis+=== Analysis ===
  
 Open the file in your spreadsheet (or cut&space). Open the file in your spreadsheet (or cut&space).
Line 130: Line 174:
 Color the "BAD", "BADN" and "BAD" columns with a conditional formating, with value > 3. Color the "BAD", "BADN" and "BAD" columns with a conditional formating, with value > 3.
  
- 
-</code> 
  
 == In Jalview: == == In Jalview: ==
Line 141: Line 183:
 Put a vertical line a the root of the tree to split the tree in two. Put a vertical line a the root of the tree to split the tree in two.
  
 +Some sites are interesting, i.e.:
 +   * Positon 3 BAD
 +   * Position 762 BAD
 +   * Position 223 BADX
  
-Positon 3 BAD +There are only three genes in the group de VPP4, that explains why the BADX score are very close to the BAD score.
-Position 762 BAD +
-Position 223 BADX+
  
Print/export