During the GATE training event of 2009 Andrew Borthwick from Spock Corp. was asking about ways to improve performance when running large JAPE grammars. At the time the only help I could offer was to suggest the usual things: don’t overuse Kleene operators, reduce the number of phases by grouping more rules in the same phase, do some profiling to identify problem spots and then try to redesign them. I don’t imagine that was very useful to him, but it did motivate me to have another look at JAPE and see if I can improve on the execution algorithm.
When people use search tools they are not really looking for words, they are looking for information, which happens to occur encoded as words in documents. We in the GATE team are pretty good at letting computers get at [some of] this information. A few years ago we started thinking we could use our information extraction work to provide better search tools. The result of these thoughts (and many years of work) is Mímir, a multi-paradigm index that uses text content, annotations and semantics to let you find what you’re looking for.