This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tutorials:eccb_t2_codeml [2012/09/07 12:59] – romainstuder | tutorials:eccb_t2_codeml [2012/09/09 09:47] (current) – romainstuder | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ==== CodeML/PAML ==== | ==== CodeML/PAML ==== | ||
| - | Lysozyme | ||
| - | In this post, I will make a short tutorial on one of my favourite programs, CodeML, which is definitely not the easiest to use. | + | === Install |
| - | === Theoretical principles === | + | Download it from here: http:// |
| + | |||
| + | Windows (The bin/ folder already contains windows executables): | ||
| + | < | ||
| + | cd paml4.6/ | ||
| + | </ | ||
| + | |||
| + | |||
| + | Linux: | ||
| + | < | ||
| + | cd paml4.5/ | ||
| + | rm bin/*.exe | ||
| + | cd src | ||
| + | make -f Makefile | ||
| + | ls -l | ||
| + | rm *.o | ||
| + | mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin | ||
| + | chmod +x ../bin/* | ||
| + | </ | ||
| + | |||
| + | MacOSX: | ||
| + | < | ||
| + | cd paml4.6/ | ||
| + | rm bin/*.exe | ||
| + | cd src | ||
| + | cc -O2 -o baseml baseml.c tools.c -lm | ||
| + | cc -O2 -o basemlg basemlg.c tools.c -lm | ||
| + | cc -O2 -o codeml codeml.c tools.c -lm | ||
| + | cc -O2 -o pamp pamp.c tools.c -lm | ||
| + | cc -O2 -o mcmctree mcmctree.c tools.c -lm | ||
| + | cc -O2 -o evolver evolver.c tools.c -lm | ||
| + | cc -O2 -o yn00 yn00.c tools.c -lm | ||
| + | cc -O2 -o chi2 chi2.c -lm | ||
| + | ls -l | ||
| + | rm *.o | ||
| + | mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin | ||
| + | chmod +x ../bin/* | ||
| + | </ | ||
| + | |||
| + | |||
| + | === Theoretical principles | ||
| The selective pressure in protein coding genes can be detected within the framework of comparative genomics. The selective pressure is assumed to be defined by the ratio (ω) dN/dS. dS represents the synonymous rate (changing | The selective pressure in protein coding genes can be detected within the framework of comparative genomics. The selective pressure is assumed to be defined by the ratio (ω) dN/dS. dS represents the synonymous rate (changing | ||
| Line 15: | Line 54: | ||
| It estimates various parameters (Ts/Tv, dN/dS, branch length) on the codon (nucleotide) alignment, based on a predefined topology (phylogenetic tree). | It estimates various parameters (Ts/Tv, dN/dS, branch length) on the codon (nucleotide) alignment, based on a predefined topology (phylogenetic tree). | ||
| - | Different codon models exist in CodeML. The model 0 estimates a unique dN/dS ratio for the whole alignment. Not really interesting, | + | Different |
| + | * The model 0 estimates a unique dN/dS ratio for the whole alignment. Not really interesting, | ||
| + | * The site models estimate different dN/dS among sites (ie in the antigen-binding groove of the MHC). | ||
| + | * The branch-site models estimate different dN/dS among sites and among branches. It can detect episodic evolution in protein sequences. | ||
| First, we have to define the branch where we think that position could have occurred. We will call this branch the " | First, we have to define the branch where we think that position could have occurred. We will call this branch the " | ||
| Line 55: | Line 97: | ||
| Let's go in details. | Let's go in details. | ||
| - | File Preparation: | + | === Identification of positive seleciton in the serine/ |
| - | We need four files to run CodeML: | + | In the evolution vertebrates, |
| + | And if yes, which residues were under positive selection. | ||
| - | | + | We need four files to run CodeML (unzip them all): |
| - | - The phylogenetic tree in newick format, with the branch of interest specified by "# | + | |
| - | - A command file where all parameters to run CodeML under the alternative model are specified: | + | |
| - | - A command file where all parameters to run CodeML under the null model are specified: | + | - The phylogenetic tree in newick format, with the branch of interest specified by "# |
| + | - A command file where all parameters to run CodeML under the alternative model are specified: | ||
| + | - A command file where all parameters to run CodeML under the null model are specified: | ||
| + | |||
| + | The tree looks like: | ||
| + | {{: | ||
| == Execute CodeML == | == Execute CodeML == | ||
| Line 68: | Line 116: | ||
| Run command file (alternative model): | Run command file (alternative model): | ||
| - | We estimate the Ts/Tv ratio (fix_kappa = 0) and the dN/dS (fix_omega = 0). The branch-site model is specified by setting | + | We estimate the Ts/Tv ratio (fix_kappa = 0) and the dN/dS (fix_omega = 0). The branch-site model is specified by setting |
| + | * model = 2 (different dN/dS for branches) | ||
| + | * NSsites value to 2 (which allows 3 categories for sites: purifying, neutral and positive selection). | ||
| < | < | ||
| Line 146: | Line 196: | ||
| codeml ./ | codeml ./ | ||
| codeml ./ | codeml ./ | ||
| - | <./code> | + | </ |
| In Windows: | In Windows: | ||
| Line 154: | Line 204: | ||
| </ | </ | ||
| + | Two mlc output files are produced (as it can take time, you can download them directly in the next step).: | ||
| + | * {{: | ||
| + | * {{: | ||
| === Analyse results === | === Analyse results === | ||
| - | 1) Assign significance of the detection of positive selection on the selected branch: | + | == Step 1) Assign significance of the detection of positive selection on the selected branch: |
| - | Two output files are produced: | ||
| - | TF105351.Eut.3.53876.mlc | + | We retieve the likelihood values lnL1 and lnL0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc |
| - | We retieve the likelihood values lnL1 and lnL0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively: | + | We retieve the number of parameters np1 and np0 from TF105351.Eut.3.53876.mlc and TF105351.Eut.3.53876.fixed.mlc files, respectively. |
| - | * lnL(ntime: 41 np: 46): -4707.210163 | + | * lnL(ntime: 41 np: 46): |
| - | * lnL(ntime: 41 np: 45): -4710.222252 | + | * lnL(ntime: 41 np: 45): |
| - | We can construct the LRT (you can use your favourise | + | We can construct the LRT (you can use your favourite |
| ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.210163 - (-4710.222252)) = 6.024178 | ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.210163 - (-4710.222252)) = 6.024178 | ||
| Line 175: | Line 227: | ||
| The degree of freedom is 1 (np1 - np0 = 46 - 45). | The degree of freedom is 1 (np1 - np0 = 46 - 45). | ||
| - | p-value = 0.014104 (under χ²) => significant. | + | (With OpenOffice: 1-CHISQDIST(6.024178; |
| + | |||
| + | p-value = 0.014104 (under χ²) => __significant!__ | ||
| A significant result with the branch-site codon model means that positive selection affected a subset of sites during a specific evolutionary time (also called episodic model of protein evolution). | A significant result with the branch-site codon model means that positive selection affected a subset of sites during a specific evolutionary time (also called episodic model of protein evolution). | ||
| - | 2) If significant, | + | == Step 2) If significant, |
| In the TF105351.Eut.3.53876.mlc, | In the TF105351.Eut.3.53876.mlc, | ||
| Line 192: | Line 246: | ||
| * Position 159 has a very high probability (99.3%) of being under positive selection. | * Position 159 has a very high probability (99.3%) of being under positive selection. | ||
| + | You can visualise the multiple in Jalview. | ||
| + | |||
| + | - Open Jalview | ||
| + | |||
| + | - Load TF105351.Eut.3.phy | ||
| + | |||
| + | - Then: Calculate-> | ||
| + | |||
| + | {{: | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | === Using other models === | ||
| + | |||
| + | Other models can be tested by changing these parameters model and NSsites. | ||
| + | |||
| + | == Example 1: == | ||
| + | |||
| + | Site model M1 (neutral): | ||
| + | * model = 0 (dN/dS doesn' | ||
| + | * NSsites = 1 (which allows 2 categories for sites: purifying and neutral). | ||
| + | |||
| + | Site model M2a (positive selection): | ||
| + | * model = 0 (dN/dS doesn' | ||
| + | * NSsites = 2 (which allows 3 categories for sites: purifying, neutral and positive selection). | ||
| + | |||
| + | Then we can compare M1 and M2a by the likelihood ratio test. | ||
| + | |||
| + | |||
| + | == Example 2: == | ||
| + | |||
| + | |||
| + | Branch model M0: | ||
| + | * model = 0 (dN/dS doesn' | ||
| + | * NSsites = 0 (dN/dS doesn' | ||
| + | |||
| + | Branch model M2 with different dN/dS (positive selection on selected branches): | ||
| + | * model = 2 (different dN/dS for branches) | ||
| + | * NSsites = 2 (dN/dS doesn' | ||
| + | Then we can compare M0 and M2 by the likelihood ratio test. | ||