The testing plugin is enabled and should be disabled.
Differences
This shows you the differences between two versions of the page.
tutorials:eccb_t2_codeml [2012/09/07 13:44] romainstuder |
tutorials:eccb_t2_codeml [2012/09/09 09:47] (current) romainstuder |
||
---|---|---|---|
Line 1: | Line 1: | ||
==== CodeML/PAML ==== | ==== CodeML/PAML ==== | ||
+ | |||
+ | |||
+ | === Install of PAML === | ||
+ | |||
+ | Download it from here: http://abacus.gene.ucl.ac.uk/software/paml.html#download | ||
+ | |||
+ | Windows (The bin/ folder already contains windows executables): | ||
+ | <code> | ||
+ | cd paml4.6/ | ||
+ | </code> | ||
+ | |||
+ | |||
+ | Linux: | ||
+ | <code> | ||
+ | cd paml4.5/ | ||
+ | rm bin/*.exe | ||
+ | cd src | ||
+ | make -f Makefile | ||
+ | ls -l | ||
+ | rm *.o | ||
+ | mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin | ||
+ | chmod +x ../bin/* | ||
+ | </code> | ||
+ | |||
+ | MacOSX: | ||
+ | <code> | ||
+ | cd paml4.6/ | ||
+ | rm bin/*.exe | ||
+ | cd src | ||
+ | cc -O2 -o baseml baseml.c tools.c -lm | ||
+ | cc -O2 -o basemlg basemlg.c tools.c -lm | ||
+ | cc -O2 -o codeml codeml.c tools.c -lm | ||
+ | cc -O2 -o pamp pamp.c tools.c -lm | ||
+ | cc -O2 -o mcmctree mcmctree.c tools.c -lm | ||
+ | cc -O2 -o evolver evolver.c tools.c -lm | ||
+ | cc -O2 -o yn00 yn00.c tools.c -lm | ||
+ | cc -O2 -o chi2 chi2.c -lm | ||
+ | ls -l | ||
+ | rm *.o | ||
+ | mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin | ||
+ | chmod +x ../bin/* | ||
+ | </code> | ||
+ | |||
=== Theoretical principles of the Branch-site model === | === Theoretical principles of the Branch-site model === | ||
Line 59: | Line 102: | ||
And if yes, which residues were under positive selection. | And if yes, which residues were under positive selection. | ||
- | We need four files to run CodeML: | + | We need four files to run CodeML (unzip them all): |
- | - The multiple nucleotide (CDS) alignment, in PHYLIP format. CodeML will strictly remove any position that contains at least one gap or an unknown "N" nucleotide: [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.phy|TF105351.Eut.3.phy]] | + | - The multiple nucleotide (CDS) alignment, in PHYLIP format. CodeML will strictly remove any position that contains at least one gap or an unknown "N" nucleotide: {{:tutorials:tf105351.eut.3.phy.zip|}} |
- | - The phylogenetic tree in newick format, with the branch of interest specified by "#1"(You can view it with NJplot or FigTree): [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.53876.tree|TF105351.Eut.3.53876.tree]] | + | - The phylogenetic tree in newick format, with the branch of interest specified by "#1"(You can view it with NJplot or FigTree): {{:tutorials:tf105351.eut.3.53876.tree.zip|}} |
- | - A command file where all parameters to run CodeML under the alternative model are specified: [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.53876.ctl|TF105351.Eut.3.53876.ctl]] | + | - A command file where all parameters to run CodeML under the alternative model are specified: {{:tutorials:tf105351.eut.3.53876.ctl.zip|}} |
- | - A command file where all parameters to run CodeML under the null model are specified: [[https://sites.google.com/site/evosite3d/home/files/TF105351.Eut.3.53876.fixed.ctl|TF105351.Eut.3.53876.fixed.ctl]] | + | - A command file where all parameters to run CodeML under the null model are specified: {{:tutorials:tf105351.eut.3.53876.fixed.ctl.zip|}} |
+ | |||
+ | The tree looks like: | ||
+ | {{:tutorials:tree.png|}} | ||
== Execute CodeML == | == Execute CodeML == | ||
Line 70: | Line 116: | ||
Run command file (alternative model): | Run command file (alternative model): | ||
- | We estimate the Ts/Tv ratio (fix_kappa = 0) and the dN/dS (fix_omega = 0). The branch-site model is specified by setting the model parameter to 2 (different dN/dS for branches) and the NSsites value to 2 (which allows 3 categories for sites: purifying, neutral and positive selection). | + | We estimate the Ts/Tv ratio (fix_kappa = 0) and the dN/dS (fix_omega = 0). The branch-site model is specified by setting these two parameters: |
+ | * model = 2 (different dN/dS for branches) | ||
+ | * NSsites value to 2 (which allows 3 categories for sites: purifying, neutral and positive selection). | ||
<code> | <code> | ||
Line 148: | Line 196: | ||
codeml ./TF105351.Eut.3.53876.ctl | codeml ./TF105351.Eut.3.53876.ctl | ||
codeml ./TF105351.Eut.3.53876.fixed.ctl | codeml ./TF105351.Eut.3.53876.fixed.ctl | ||
- | <./code> | + | </code> |
In Windows: | In Windows: | ||
Line 156: | Line 204: | ||
</code> | </code> | ||
+ | Two mlc output files are produced (as it can take time, you can download them directly in the next step).: | ||
+ | * {{:tutorials:tf105351.eut.3.53876.mlc.zip|TF105351.Eut.3.53876.mlc (alternative model)}} | ||
+ | * {{:tutorials:tf105351.eut.3.53876.fixed.mlc.zip|TF105351.Eut.3.53876.fixed.mlc (null model)}} | ||
Line 162: | Line 213: | ||
== Step 1) Assign significance of the detection of positive selection on the selected branch: == | == Step 1) Assign significance of the detection of positive selection on the selected branch: == | ||
- | Two output files are produced: | ||
- | * TF105351.Eut.3.53876.mlc (alternative model) | ||
- | * TF105351.Eut.3.53876.fixed.mlc (null model). | ||
We retieve the likelihood values lnL1 and lnL0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. | We retieve the likelihood values lnL1 and lnL0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. | ||
+ | |||
We retieve the number of parameters np1 and np0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. | We retieve the number of parameters np1 and np0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. | ||
Line 172: | Line 221: | ||
* lnL(ntime: 41 np: 45): **-4710.222252** +0.000000 (lnL0) | * lnL(ntime: 41 np: 45): **-4710.222252** +0.000000 (lnL0) | ||
- | We can construct the LRT (you can use your favourise spreadsheet for that. Or even better: R): | + | We can construct the LRT (you can use your favourite spreadsheet for that. Or even better with R): |
ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.210163 - (-4710.222252)) = 6.024178 | ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.210163 - (-4710.222252)) = 6.024178 | ||
Line 178: | Line 227: | ||
The degree of freedom is 1 (np1 - np0 = 46 - 45). | The degree of freedom is 1 (np1 - np0 = 46 - 45). | ||
- | p-value = 0.014104 (under χ²) => significant. | + | (With OpenOffice: 1-CHISQDIST(6.024178;1)) |
+ | |||
+ | p-value = 0.014104 (under χ²) => __significant!__ | ||
A significant result with the branch-site codon model means that positive selection affected a subset of sites during a specific evolutionary time (also called episodic model of protein evolution). | A significant result with the branch-site codon model means that positive selection affected a subset of sites during a specific evolutionary time (also called episodic model of protein evolution). | ||
Line 195: | Line 246: | ||
* Position 159 has a very high probability (99.3%) of being under positive selection. | * Position 159 has a very high probability (99.3%) of being under positive selection. | ||
+ | You can visualise the multiple in Jalview. | ||
+ | |||
+ | - Open Jalview | ||
+ | |||
+ | - Load TF105351.Eut.3.phy | ||
+ | |||
+ | - Then: Calculate->translate cDNA. (tips: by moving the pointer on the amino acids alignment, you can see the corresponding codon in the nucleotide alignment. | ||
+ | |||
+ | {{:tutorials:tf105351.eut.3.aln.png|}} | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | === Using other models === | ||
+ | |||
+ | Other models can be tested by changing these parameters model and NSsites. | ||
+ | |||
+ | == Example 1: == | ||
+ | |||
+ | Site model M1 (neutral): | ||
+ | * model = 0 (dN/dS doesn't vary on branches) | ||
+ | * NSsites = 1 (which allows 2 categories for sites: purifying and neutral). | ||
+ | |||
+ | Site model M2a (positive selection): | ||
+ | * model = 0 (dN/dS doesn't vary on branches) | ||
+ | * NSsites = 2 (which allows 3 categories for sites: purifying, neutral and positive selection). | ||
+ | |||
+ | Then we can compare M1 and M2a by the likelihood ratio test. | ||
+ | |||
+ | |||
+ | == Example 2: == | ||
+ | |||
+ | |||
+ | Branch model M0: | ||
+ | * model = 0 (dN/dS doesn't vary on branches) | ||
+ | * NSsites = 0 (dN/dS doesn't vary on sites). | ||
+ | |||
+ | Branch model M2 with different dN/dS (positive selection on selected branches): | ||
+ | * model = 2 (different dN/dS for branches) | ||
+ | * NSsites = 2 (dN/dS doesn't vary on sites). | ||
+ | Then we can compare M0 and M2 by the likelihood ratio test. |