====== Dr. Andrew B. Clegg ====== {{ :cathteam:sausage.jpg|Andrew with a half-metre sausage, found (and eaten) on holiday in Germany recently}} ===== Senior Research Associate, CATH Development ===== A member of the [[cathteam:index|Orengo group]] since June 2008, I am the technical lead on the [[http://funcnet.eu/|FuncNet]] platform, which brings together an ensemble of protein function analysis tools from various groups around Europe. This work is supported by the EU-funded [[http://www.embracegrid.info/|EMBRACE]] and [[http://enfin.org.|ENFIN]] research networks. I'm involved in various other initiatives to extend the capabilities of [[http://cathdb.info/|CATH]] and [[http://gene3d.biochem.ucl.ac.uk/Gene3D/|Gene3D]] and enable them to interoperate better with bioinformatics resources at other organizations. I also write and maintain [[http://biotext.org.uk|biotext.org.uk]]. Contact: {{:cathteam:ucl_email_3.png|}} http://twitter.com/andrew_clegg ===== Academic Background ===== I am a data scientist and software developer with experience in the fields of molecular biology, clinical research, public health and the pharmaceutical industry. My MSc and PhD projects at [[http://www.cryst.bbk.ac.uk/|Birkbeck]] were on text mining techniques for bioinformatics, and this is a field I am still involved in. My {{:cathteam:clegg_thesis.pdf|thesis}}, supervised by [[http://people.cryst.bbk.ac.uk/~ubcg60a/group/index.html|Dr. Adrian Shepherd]], was on parsing sentences into phrase structure trees and dependency graphs, and extracting facts about gene regulation from these syntactic structures. As an undergrad I studied [[http://www.ucl.ac.uk/sts/|History and Philosophy of Science at UCL]] and I am still interested in the history and public perception of science, technology and medicine. ===== Current Research Interests ===== As of Summer 2010, I'm mostly working on: * Visualization of biological networks * Information retrieval and text mining * Integrative statistical methods for protein function prediction * Web service development (SOAP, REST, JSON-RPC) * Rich internet applications (AJAX, GWT) Outside of the lab, I'm the sole developer of the [[http://graphspider.sf.net/|GraphSpider/MPL]] natural-language processing toolkit, and also involved with [[http://smeshup.com/|Smesh]], a platform for analysis of social media data. ===== Publications ===== Alison Cuff, Ian Sillitoe, Tony Lewis, Andrew Clegg, Robert Rentzsch, Nicholas Furnham, Marialuisa Pellegrini-Calace, David T. Jones, Janet Thornton and Christine A. Orengo, "[[http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq1001.full|Extending CATH: Increasing Coverage of the Protein Structure Universe and Linking Structure with Function]]", in //Nucleic Acids Research// Database Issue **39** (2010). Juan A. G. Ranea, Ian Morilla, Jon G. Lees, Adam J. Reid, Corin Yeats, Andrew B. Clegg, Francisca Sánchez Jiménez and Christine Orengo, "[[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000945|Finding the 'Dark Matter' in Human and Yeast Protein Network Prediction and Modelling]]", in //PLoS Computational Biology// **6**:9 (2010). Contributor, //[[http://www.nice.org.uk/CG102|The management of bacterial meningitis and meningococcal septicaemia in children and young people younger than 16 years in primary and secondary care]]// (National Institute for Health and Clinical Excellence, 2010). Adam J. Reid, Juan A. G. Ranea, Andrew B. Clegg and Christine A. Orengo, "[[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010908|CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion]]", in //PLoS ONE// **5**:6 (2010). Steve Pettifer, Jon Ison, Matus Kalas, Dave Thorne, Philip McDermott, Inge Jonassen, Ali Liaquat, Jose M. Fernandez, Jose M. Rodriguez, INB-Partners, David G. Pisano, Christophe Blanchet, Mahmut Uludag, Peter Rice, Edita Bartaseviciute, Kristoffer Rapacki, Maarten Hekkelman, Olivier Sand, Heinz Stockinger, Andrew B. Clegg, Erik Bongcam-Rudloff, Jean Salzemann, Vincent Breton, Teresa K. Attwood, Graham Cameron and Gert Vriend, "[[http://nar.oxfordjournals.org/cgi/content/full/gkq297?ijkey=0zo5SYmiVxwfNnc&keytype=ref|The EMBRACE Web Service Collection]]", in //Nucleic Acids Research// Web Servers Issue (2010). Corin Yeats, Jon Lees, Oliver Redfern, Andrew Clegg and Christine Orengo, "[[http://nar.oxfordjournals.org/cgi/content/full/gkp987|Gene3D: Merging Structure and Function For a Thousand Genomes]]", in //Nucleic Acids Research// Database Issue **38**:D296-D300 (2009). Pascal Kahlem, Andrew Clegg, Florian Reisinger, Ioannis Xenarios, Henning Hermjakob, Christine Orengo and Ewan Birney, "[[http://dx.doi.org/10.1016/j.crvi.2009.09.003|ENFIN -- A European network for integrative systems biology]]", in //Comptes Rendus Biologies// **332**:11 (2009). Contributor, //[[http://guidance.nice.org.uk/PH21|Reducing differences in the uptake of immunisations]]// (National Institute for Health and Clinical Excellence, 2009). Jose M. G. Izarzugaza, Anja Baresic, Lisa E. M. McMillan, Corin Yeats, Andrew B. Clegg, Christine A. Orengo, Andrew C. R. Martin and Alfonso Valencia, "[[http://www.biomedcentral.com/1471-2105/10/S8/S5|An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D]]", in //BMC Bioinformatics// **10** (Suppl 8):S5 (2009). Renata Kabiljo, Andrew B. Clegg and Adrian J. Shepherd, "[[http://www.biomedcentral.com/1471-2105/10/233|A realistic assessment of methods for extracting gene/protein interactions from free text]]", in //BMC Bioinformatics// **10**:233 (2009). Andrew B. Clegg and Adrian J. Shepherd, "Syntactic pattern matching with GraphSpider and MPL", in //[[http://mars.cs.utu.fi/smbm2008/?q=proceedings|Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM'08)]]// (Turku, Finland: 2008). Andrew B. Clegg and Debbie Pledge, "Streamlining the clinical guideline production process with fuzzy citation matching", in //[[https://oa.doria.fi/handle/10024/41995|Proceedings of the First Conference on Text and Data Mining of Clinical Documents (Louhi'08)]]// (Turku, Finland: 2008). Andrew B. Clegg and Adrian J. Shepherd, "[[http://www.ncbi.nlm.nih.gov/pubmed/18712320|Text mining]]", in Jon Keith (ed.), //[[http://www.springerlink.com/content/g864wv1j02744r17/|Bioinformatics Volume II: Structure, Function and Applications]]// (Humana Press, New Jersey: 2008). Christian Guy, Emma Goddard, Emily Milner, Lisa Murch, and Andrew B. Clegg, "Looking into the core of the sun", in Hasok Chang and Catherine Jackson (eds.), //[[http://www.bshs.org.uk/bshs/publications/monograph_series/An_Element_of_Controversy|An Element of Controversy: The Life of Chlorine in Science, Medicine, Technology and War]]// (British Society for the History of Science: 2007). Andrew B. Clegg and Adrian J. Shepherd, "[[http://www.biomedcentral.com/1471-2105/8/24/|Benchmarking natural-language parsers for biological applications using dependency graphs]]", in //BMC Bioinformatics// **8**:24 (2007). Andrew B. Clegg and Adrian J. Shepherd, "[[http://www1.cs.columbia.edu/nlp/acl05soft/clegg.pdf|Evaluating and integrating treebank parsers on a biomedical corpus]]", in //Proceedings of the Association for Computational Linguistics Workshop on Software// (Ann Arbor, Michigan: 2005). ===== Other Interests ===== Along with [[http://www.cassj.co.uk/blog/|Cass Johnston]] and [[http://www3.imperial.ac.uk/theoreticalsystemsbiology/people/nathanharmston|Nathan Harmston]], I run an informal meet-up group for bioinformatics people and anyone else interested in the technical side of what we do: [[http://biogeeks.wordpress.com/|London BioGeeks]]. We have monthly-ish technical meetings (with talks) and social nights (with beer). Come along sometime. I'm also involved with the London [[http://www.meetup.com/Londonjavacommunity/|Java Community]] and [[http://groups.google.com/group/london-clojurians/?pli=1|Clojure Dojo]]. I'm an occasional peer reviewer for [[http://bioinformatics.oupjournals.org/|Bioinformatics]] and the [[http://www.elsevier.com/locate/yjbin|Journal of Biomedical Informatics]], and I was on the programme committees for [[http://www.it.utu.fi/louhi/|Louhi 2008]], the [[http://compbio.uchsc.edu/BioNLP2009/|BioNLP 2009]] [[http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/|shared task]], and the [[http://www.acl2010.org/|ACL 2010]] track on "NLP for biology, medicine, law, etc." On a slightly less geeky note (or perhaps not) I am interested in electronica and avant-garde music, linguistics, London and cycling. Not usually at the same time, though. I also DJ occasionally at events like [[http://dronesclub.org.uk/|the Drones Club]] and write music reviews for websites such as [[http://connexionbizarre.net/|Connexion Bizarre]]. ===== Links ===== My main site is at [[http://biotext.org.uk]]. My [[http://chomsky-ext.cryst.bbk.ac.uk/andrew|Birkbeck homepage]] still exists, for historical value only. ===== Other CATH Team Members ===== ~~DIR?cols=page;description&hdrs=Person; Description~~