The testing plugin is enabled and should be disabled.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

tutorials:eccb_t2_codeml [2012/09/07 13:44]
romainstuder
tutorials:eccb_t2_codeml [2012/09/09 09:47] (current)
romainstuder
Line 1: Line 1:
 ==== CodeML/PAML ==== ==== CodeML/PAML ====
 +
 +
 +=== Install of PAML ===
 +
 +Download it from here: http://abacus.gene.ucl.ac.uk/software/paml.html#download
 +
 +Windows (The bin/ folder already contains windows executables):
 +<code>
 +cd paml4.6/
 +</code>
 +
 +
 +Linux:
 +<code>
 +cd paml4.5/
 +rm bin/*.exe
 +cd src
 +make -f Makefile
 +ls -l
 +rm *.o
 +mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin
 +chmod +x ../bin/*
 +</code>
 +
 +MacOSX:
 +<code>
 +cd paml4.6/
 +rm bin/*.exe
 +cd src
 +cc -O2 -o baseml baseml.c tools.c -lm
 +cc -O2 -o basemlg basemlg.c tools.c -lm
 +cc -O2 -o codeml codeml.c tools.c -lm
 +cc -O2 -o pamp pamp.c tools.c -lm
 +cc -O2 -o mcmctree mcmctree.c tools.c -lm
 +cc -O2 -o evolver evolver.c tools.c -lm
 +cc -O2 -o yn00 yn00.c tools.c -lm
 +cc -O2 -o chi2 chi2.c -lm
 +ls -l
 +rm *.o
 +mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin
 +chmod +x ../bin/*
 +</code>
 +
  
 === Theoretical principles of the Branch-site model === === Theoretical principles of the Branch-site model ===
Line 59: Line 102:
 And if yes, which residues were under positive selection. And if yes, which residues were under positive selection.
  
-We need four files to run CodeML:+We need four files to run CodeML (unzip them all):
  
-  - The multiple nucleotide (CDS) alignment, in PHYLIP format. CodeML will strictly remove any position that contains at least one gap or an unknown "N" nucleotide: [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.phy|TF105351.Eut.3.phy]] +  - The multiple nucleotide (CDS) alignment, in PHYLIP format. CodeML will strictly remove any position that contains at least one gap or an unknown "N" nucleotide: {{:tutorials:tf105351.eut.3.phy.zip|}} 
-  - The phylogenetic tree in newick format, with the branch of interest specified by "#1"(You can view it with NJplot or FigTree): [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.53876.tree|TF105351.Eut.3.53876.tree]] +  - The phylogenetic tree in newick format, with the branch of interest specified by "#1"(You can view it with NJplot or FigTree): {{:tutorials:tf105351.eut.3.53876.tree.zip|}} 
-  - A command file where all parameters to run CodeML under the alternative model are specified: [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.53876.ctl|TF105351.Eut.3.53876.ctl]] +  - A command file where all parameters to run CodeML under the alternative model are specified: {{:tutorials:tf105351.eut.3.53876.ctl.zip|}} 
-  - A command file where all parameters to run CodeML under the null model are specified: [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.53876.fixed.ctl|TF105351.Eut.3.53876.fixed.ctl]]+  - A command file where all parameters to run CodeML under the null model are specified: {{:tutorials:tf105351.eut.3.53876.fixed.ctl.zip|}} 
 + 
 +The tree looks like: 
 +{{:tutorials:tree.png|}}
  
 == Execute CodeML == == Execute CodeML ==
Line 70: Line 116:
 Run command file (alternative model): Run command file (alternative model):
  
-We estimate the Ts/Tv ratio (fix_kappa = 0) and the dN/dS (fix_omega = 0). The branch-site model is specified by setting the model parameter to 2 (different dN/dS for branches) and the NSsites value to 2 (which allows 3 categories for sites: purifying, neutral and positive selection).+We estimate the Ts/Tv ratio (fix_kappa = 0) and the dN/dS (fix_omega = 0). The branch-site model is specified by setting these two parameters: 
 +    * model 2 (different dN/dS for branches)  
 +    * NSsites value to 2 (which allows 3 categories for sites: purifying, neutral and positive selection).
  
 <code> <code>
Line 148: Line 196:
 codeml ./TF105351.Eut.3.53876.ctl codeml ./TF105351.Eut.3.53876.ctl
 codeml ./TF105351.Eut.3.53876.fixed.ctl codeml ./TF105351.Eut.3.53876.fixed.ctl
-<./code>+</code>
  
 In Windows: In Windows:
Line 156: Line 204:
 </code> </code>
  
 +Two mlc output files are produced (as it can take time, you can download them directly in the next step).:
 +  * {{:tutorials:tf105351.eut.3.53876.mlc.zip|TF105351.Eut.3.53876.mlc (alternative model)}}
 +  * {{:tutorials:tf105351.eut.3.53876.fixed.mlc.zip|TF105351.Eut.3.53876.fixed.mlc (null model)}}
  
  
Line 162: Line 213:
 == Step 1) Assign significance of the detection of positive selection on the selected branch: == == Step 1) Assign significance of the detection of positive selection on the selected branch: ==
  
-Two output files are produced: 
-  * TF105351.Eut.3.53876.mlc (alternative model) 
-  * TF105351.Eut.3.53876.fixed.mlc (null model). 
  
 We retieve the likelihood values lnL1 and lnL0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. We retieve the likelihood values lnL1 and lnL0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively.
 +
 We retieve the number of parameters np1 and np0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. We retieve the number of parameters np1 and np0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively.
  
Line 172: Line 221:
    * lnL(ntime: 41  np: 45):  **-4710.222252**      +0.000000  (lnL0)    * lnL(ntime: 41  np: 45):  **-4710.222252**      +0.000000  (lnL0)
  
-We can construct the LRT (you can use your favourise spreadsheet for that. Or even betterR):+We can construct the LRT (you can use your favourite spreadsheet for that. Or even better with R):
  
 ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.210163 - (-4710.222252)) = 6.024178 ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.210163 - (-4710.222252)) = 6.024178
Line 178: Line 227:
 The degree of freedom is 1 (np1 - np0 = 46 - 45). The degree of freedom is 1 (np1 - np0 = 46 - 45).
  
-p-value = 0.014104 (under χ²) => significant.+(With OpenOffice: 1-CHISQDIST(6.024178;1)) 
 + 
 +p-value = 0.014104 (under χ²) => __significant!__
  
 A significant result with the branch-site codon model means that positive selection affected a subset of sites during a specific evolutionary time (also called episodic model of protein evolution). A significant result with the branch-site codon model means that positive selection affected a subset of sites during a specific evolutionary time (also called episodic model of protein evolution).
Line 195: Line 246:
    * Position 159 has a very high probability (99.3%) of being under positive selection.    * Position 159 has a very high probability (99.3%) of being under positive selection.
  
 +You can visualise the multiple in Jalview.
 +
 +- Open Jalview
 +
 +- Load TF105351.Eut.3.phy
 +
 +- Then: Calculate->translate cDNA. (tips: by moving the pointer on the amino acids alignment, you can see the corresponding codon in the nucleotide alignment.
 +
 +{{:tutorials:tf105351.eut.3.aln.png|}}
 +
 +
 +
 +
 +
 +=== Using other models ===
 +
 +Other models can be tested by changing these parameters model and NSsites.
 +
 +== Example 1: ==
 +
 +Site model M1 (neutral):
 +    * model = 0 (dN/dS doesn't vary on branches) 
 +    * NSsites = 1 (which allows 2 categories for sites: purifying and neutral).
 +
 +Site model M2a (positive selection):
 +    * model = 0 (dN/dS doesn't vary on branches) 
 +    * NSsites = 2 (which allows 3 categories for sites: purifying, neutral and positive selection).
 +
 +Then we can compare M1 and M2a by the likelihood ratio test.
 +
 +
 +== Example 2: ==
 +
 +
 +Branch model M0:
 +    * model = 0 (dN/dS doesn't vary on branches) 
 +    * NSsites = 0 (dN/dS doesn't vary on sites).
 +
 +Branch model M2 with different dN/dS (positive selection on selected branches):
 +    * model = 2 (different dN/dS for branches) 
 +    * NSsites = 2 (dN/dS doesn't vary on sites).
  
 +Then we can compare M0 and M2 by the likelihood ratio test.
Print/export