====== FuncNet ====== FuncNet is a distributed protein function comparison pipeline, funded by the European Union's [[http://www.embracegrid.info/|EMBRACE Network of Excellence]], and developed in partnership with the [[http://www.enfin.org/|ENFIN]] project.

This page is no longer maintained. See funcnet.eu instead.

===== Aims ===== The objective of FuncNet is to provide an open platform for the computational prediction and analysis of protein function. It is designed to answer questions like: //Given one set of proteins which are known to share a particular biological function...// //... which of these other proteins also share that function?// A good example of this is the prediction of proteins involved in the formation and activity of the mitotic spindle. Since a set of known spindle proteins already exists ([[http://www.mcponline.org/cgi/content/full/4/1/35|Sauer et al. 2005]]), FuncNet can be used to predict whether uncharacterized or partially-characterized proteins also belong in this set, by aggregating pairwise functional similarity predictions between query and reference proteins. ===== Implementation ===== FuncNet is an open architecture on which multiple prediction algorithms can be queried in parallel in order to provide higher-quality results. Each predictor is made available via a [[http://en.wikipedia.org/wiki/SOAP|SOAP]] web service using a standardized [[http://en.wikipedia.org/wiki/Web_Services_Description_Language|WSDL]] interface. This means that every FuncNet prediction service is functionally interchangeable -- they can all be invoked via the same message format (described below), and you only need to change the endpoint URL and the service and port names. The current template WSDL for FuncNet services is [[projects:funcnet:wsdl|here]]. However, a better way to submit queries is via a front-end service which is responsible for forwarding the request to each of the predictors in parallel. On receipt of the results, it uses [[http://en.wikipedia.org/wiki/Fisher_method|Fisher's unweighted method]] to integrate the various predictors' responses into a single prediction for each query protein: {{:projects:funcnet:funcnet_diagram.png?600|}} Of course, users can submit queries directly to the individual predictors, although the strength of FuncNet comes from its ability to combine the predictions of multiple algorithms which use distinct methods and sources of evidence. ===== Predictors ===== There are currently five prediction algorithms online, using various different sources of evidence: * CODA (hosted at UCL): evolutionary relatedness based on domains found together in other species * engineDB (hosted at CNR-ITB): detection of functionally analogous proteins via GO annotations * GECO (hosted at UCL): correlated patterns of gene expression from microarray experiments * hiPPI (hosted at UCL): homology-based inheritance of protein-protein interactions from public databases * JACOP (hosted at SIB): unsupervised clustering and classification based on detection of homologous sub-sequences ===== Usage ===== FuncNet queries can be submitted from any SOAP client which supports the 'document/literal wrapped' style. Almost all modern SOAP toolkits support this model. The interface is intentionally very simple so databinding shouldn't be a problem, regardless of what programming language you use, and of course you can 'roll your own' XML if you prefer. To get you started, we've provided some example libraries and scripts (see Links below) and will be adding more samples in different languages over time. Feel free to send us your own! If you want to try out FuncNet without doing any coding, download [[http://www.soapui.org/|soapUI]]. This is a very handy Java tool for testing web services. Choose 'New WSDL Project' from the File menu, paste in the URL of a FuncNet WSDL (see below), and it'll generate the appropriate request templates for you. Then you can insert some [[http://www.uniprot.org/|UniProt]] primary accessions in the

fields (see below), submit the query by pressing the green 'play' button, and wait for some results. We are working on integrating FuncNet into the [[http://www.enfin.org/dokuwiki/doku.php?id=wiki:wp1|EnCORE]] web services framework too -- check back here for further news. **NB FuncNet only understands UniProt primary accessions, and is currently limited to human proteins.** ==== Predictor request format ==== The standard format for a request to a FuncNet prediction service (without the SOAP wrappings) looks like this:

Q8NFN7

Q8NF37

By convention, ''proteins1'' is the list of query proteins (unknown function) and ''proteins2'' is the list of reference proteins (known function). **NB Requests to JACOP must provide at least three reference proteins.** ==== Predictor response format ==== Q8NFN7 Q8NF37 65.20529 .140766 (The numbers in this example are made up and don't come from any real prediction service.) ''p1'' = an accession from the ''proteins1'' list ''p2'' = an accession from the ''proteins2'' list ''rs'' = raw score from the prediction algorithm (not comparable between algorithms) ''pv'' = p-value for the prediction The p-value is formally defined as the probability that a random pair of proteins from the human genome would score equal to or higher than this pair using the same prediction algorithm. You can consider this as a test of significance at whatever cutoff you see fit (<=0.05 is usually a safe bet). The maximum number of scores that can be returned by a predictor = |''proteins1''| * |''proteins2''| and the minimum is zero. This is due to data sparsity and other factors; some of the predictors don't know anything at all about the relationship between a given pair of proteins and therefore won't even give them a low score. For example, GECO uses correlated patterns of expression in microarray experiments, and some genes just aren't commonly used on arrays, meaning it can't draw any conclusions about their products. **NB For performance purposes, many of the prediction services don't check that the accession codes supplied are genuine and from humans.** This is the responsibility of the user. Unknown accessions will just be quietly ignored. ===== Partners ===== The current release of FuncNet is a collaboration between the [[http://gene3d.biochem.ucl.ac.uk/Gene3D/|Gene3D-BioMiner]] and [[cathteam:index|CATH]] teams at University College London, [[http://hst.home.cern.ch/hst/|Heinz Stockinger]] and [[http://www.embracegrid.info/page.php?page=person&pid=21|Marco Pagni]] at the Swiss Institute of Bioinformatics, [[http://www.embracegrid.info/page.php?page=person&pid=34|Andreas Gisel]] at ITB-CNR in Bari, and Juan Ranea and Ian Morilla at the University of Malaga. It is co-ordinated by [[cathteam:clegg|Andrew Clegg]] under the supervision of [[http://www.smb.ucl.ac.uk/additional-staff-pages/professor-christine-orengo.html|Christine Orengo]]. There are two other EMBRACE groups involved in project whose contributions are in progress, [[http://www.cnio.es/ing/grupos/plantillas/presentacion.asp?grupo=50004294|the Valencia group]] at CNIO in Madrid, and [[http://www.cbs.dtu.dk/index.shtml|the Brunak group]] at DTU-CBS in Lyngby. In addition, we have recently been joined by [[http://www.compbio.dundee.ac.uk/|the Barton group]] in Dundee who are members of the ENFIN project. Our EMBRACE liaison is [[http://www.anst.uu.se/erikbr/Welcome.html|Erik Bongcam-Rudloff]] at Uppsala University. We are grateful for the technical assistance of ENFIN's [[http://www.ebi.ac.uk/Information/Staff/person_maintx.php?s_person_id=777|Florian Reisinger]]. ===== Current Status ===== The five prediction algorithms listed above are up and running individually (see Links). Feel free to submit queries to them. The front-end service (including the statistical integration of results) is still under testing, and not yet available to the public. **NB In the CODA, GECO and hiPPI services, the Raw Score (''rs'') field for each prediction currently returns zero for every result.** This is because we haven't yet imported these values into our database. However, the p-value is actually a more informative measure, since raw scores are not comparable between predictors. The front-end service only considers p-values when integrating scores from multiple predictors. ===== Links ===== ==== WSDL files for web services ==== * [[http://cathdb.info:8080/BioMiner-war/services/CodaCathService?wsdl|CODA using CATH domains]] -- ''CodaCathService'' * [[http://cathdb.info:8080/BioMiner-war/services/CodaPfamService?wsdl|CODA using Pfam domains]] -- ''CodaPfamService'' * [[http://spank.ba.itb.cnr.it/docs/FuncNet.wsdl|engineDB]] -- ''engineDBService'' * [[http://cathdb.info:8080/BioMiner-war/services/GecoService?wsdl|GECO]] -- ''GecoService'' * [[http://cathdb.info:8080/BioMiner-war/services/HippiService?wsdl|hiPPI]] -- ''HippiService'' * [[http://myhits.isb-sib.ch/doc/FuncNet.wsdl|JACOP]] -- ''JacopService'' Because of the way the [[http://cxf.apache.org/|CXF]] web service toolkit generates production WSDLs from source, the CODA, GECO and hiPPI WSDLs are actually the same (each one contains the details for all of them). The standard template from which they all derive is [[projects:funcnet:wsdl|here]]. The {{:projects:funcnet:frontendservice.wsdl|draft WSDL}} for the front-end service is also available, although the service doesn't work yet. ==== Client tools ==== * [[http://search.cpan.org/dist/WebService-Cath-FuncNet/|Perl module]] on CPAN To install this library, type perl -MCPAN -e 'install WebService::Cath::FuncNet' If you need help with this process, particularly if you don't have root on your machine and some of the new dependencies default to needing root access, have a look at [[http://www.base64.co.uk/installing-perl-modules/|this tutorial]]. * [[projects:funcnet:perlsample|Simple Perl example]] This script shows very simply how to access one of the predictors. The full CPAN library is much more capable and well-documented, but this script shows how little effort it actually is. You will need to install the ''XML::Compile'' module yourself if you download this (the CPAN library handles dependencies for you). Also you'll need to download the WSDL for the service you want to query (the CPAN library also does this for you). ==== Publications and documentation ==== * [[http://bioinformatics.bmc.uu.se/WP4/content/view/124/42/|FuncNet: Integration of bioinformatics methods for high-throughput protein function prediction]] (initial proposal) * [[http://www.biomedcentral.com/1471-2105/6/216|JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture]] (Sperisen & Pagni, 2005) * [[http://www.biomedcentral.com/1471-2105/8/329/|Gene analogue finder: a GRID solution for finding functionally analogous gene products]] (Tulipano //et al.//, 2007) * [[http://bioinformatics.bmc.uu.se/WP4/images/stories/Workshop_july_2008_Uppsala/uppsala-orengo.pdf|Prediction of new mitotic spindle proteins]] (Orengo, 2008, PDF presentation) * {{:projects:funcnet:hippi_talk.ppt|Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions}} (Yeats, 2008, PPT presentation) ==== Homepage ==== * Quick link to this page: [[http://funcnet.eu]] ==== Sponsors and related sites ==== [[http://www.embracegrid.info/|{{:projects:embrace.jpg|}}]] [[http://www.enfin.org/|{{:projects:enfin_logo2.jpg|}}]] [[http://embraceregistry.net/|{{:projects:embrace_registry.jpg?300|}}]] ==== Contact ==== Email Andrew Clegg with any enquiries, feedback or problem reports: __//spamproof//@funcnet.eu__ but replace 'spamproof' with 'info'.