Here you can find out about some of the projects I have been involved with, most of which are open source.


NLP Text Mining

Between 1999 and 2014 I have been one of the core members of the GATE project which covers all kinds of software used for language engineering tasks, such as text mining, semantic annotations, opinion mining, etc. Over time GATE has grown to include developer tools (GATE Developer), a comprehensive software library (GATE Embedded), some server side tools (Mímir and Teamware), and a cloud platform (


Text Mining Big Data

AnnoMarket is a research project active between 2012 and 2014, that builds on the work in GATECloud (see below). It aims to create a Platform-as-a-Service offering for all types of text mining tasks, that uses cloud-based infrastructure (IaaS) to provide scalability and fault tolerance. In addition, we also aim to accelerate growth by creating an open market place where a community of developers can make their work available on a pay per use basis, with no upfront costs.

Text Mining Big Data is a cloud-based platform for running GATE-based text processing tasks. It also provides SaaS deployments of specialised server software like Teamware and Mímir.

The motivation for was the need to simplify (and reduce the costs of) the scaling-up of text processing jobs. Our work in other projects has made us aware of the difficulties encountered when working with big-data. In GATECloud our aim was to generalise the particular solutions that we developed in these other projects, and make them available at the click of a mouse.

You can find out more about in my paper.

GATE Mímir

Semantic Search Big Data

GATE Mímir is a hybrid indexing framework that supports search over a collection of semantically-annotated documents. The software distribution includes:

  • mimir-core: a Java library dealing with the creation of and access to indexes (represented as on-disk directories).
  • mimir-web: a Grails plugin that provides a user interface for managing and searching Mímir indexes. It also provides a REST-style endpoint for remote integration; support for remote and federated indexes.
  • mimir-cloud: a Grails application that uses the mimir-web plugin and can be deployed as stand-alone war in any application server.
  • a set of plugins providing different module implementations.

The Web UI is built with GWT and Grails, we use MG4J for the implementation of direct and inverted indexes, and any standard SPARQL endpoint for access to semantics (ontologies). The representation of annotations in indexes relies on an purpose built library. The library provides a public API with pluggable implementations, and we include an implementation that uses the H2 in-process Java database engine. Ontotext provide an alternative implementation that uses the OWLIM RDF store to represent annotations.