If you’ve used Lucene or Solr before, you might be familiar with Luke, a Lucene tool for viewing and modifying indexes. Luke was originally written by Andrzej Bialecki as a side project, and is an indispensable debugging tool for digging into the guts of an index (for example, to see exactly which tokens were stored on a particular field).
Unfortunately, it hasn’t been particularly well maintained. The official project at Google Code (link above) hasn’t been updated since Lucene 4.0.0-ALPHA (ca. 2012). The good news is that there is some great community support for Luke at the moment:
- Ľuboš Koščo maintains an up-to-date fork at https://github.com/tarzanek/luke
- Dmitry Kan maintains an up-to-date, Mavenized fork at https://github.com/DmitryKey/luke/
Both should work out of the box with standard Lucene indexes, but you need to do a little extra work to read an ElasticSearch index.
If you try to open an index created by ElasticSearch with a stock copy of Luke, you’ll see the following error:
A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]
ElasticSearch uses a custom postings format (the postings format defines how the inverted index is represented in memory / on disk), and Luke doesn’t know about it. To tell Luke about the ES postings format, add the SPI class by following the steps below.
1. Clone Dmitry’s Mavenized repo:
$ git clone https://github.com/DmitryKey/luke/
2. Add a dependency on your required version of ElasticSearch to the Luke project’s pom file:
<!-- ElasticSearch --> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>1.1.1</version> </dependency>
3. Compile the Luke jar file (creates
$ mvn package
4. Unpack Luke’s list of known postings formats to a temporary file:
$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/ Archive: target/luke-with-deps.jar inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
5. Add the ElasticSearch postings formats to the temp file:
$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat" >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat $ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat $ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
6. Repack the modified file back into the jar:
$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat
7. Run Luke
You can now open indexes created by ElasticSearch, search for, view, and edit documents, and all the other operations Luke allows.
A good Luke tutorial can be found on LingPipe’s blog: