Tag Archives: search

Using Luke with ElasticSearch

If you’ve used Lucene or Solr before, you might be familiar with Luke, a Lucene tool for viewing and modifying indexes.  Luke was originally written by Andrzej Bialecki as a side project, and is an indispensable debugging tool for digging into the guts of an index (for example, to see exactly which tokens were stored on a particular field).

Luke overview

Luke’s Overview tab

 

Unfortunately, it hasn’t been particularly well maintained.  The official project at Google Code (link above) hasn’t been updated since Lucene 4.0.0-ALPHA (ca. 2012).  The good news is that there is some great community support for Luke at the moment:

Both should work out of the box with standard Lucene indexes, but you need to do a little extra work to read an ElasticSearch index.

 

If you try to open an index created by ElasticSearch with a stock copy of Luke, you’ll see the following error:

A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]

ElasticSearch uses a custom postings format (the postings format defines how the inverted index is represented in memory / on disk), and Luke doesn’t know about it. To tell Luke about the ES postings format, add the SPI class by following the steps below.

 

1. Clone Dmitry’s Mavenized repo:

$ git clone https://github.com/DmitryKey/luke/

 

2. Add a dependency on your required version of ElasticSearch to the Luke project’s pom file:

<!-- ElasticSearch -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>1.1.1</version>
</dependency>

 

3. Compile the Luke jar file (creates target/luke-with-deps.jar):

$ mvn package

 

4. Unpack Luke’s list of known postings formats to a temporary file:

$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/
Archive:  target/luke-with-deps.jar
  inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

5. Add the ElasticSearch postings formats to the temp file:

$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

6. Repack the modified file back into the jar:

$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

7. Run Luke

$ ./luke.sh

 

You can now open indexes created by ElasticSearch, search for, view, and edit documents, and all the other operations Luke allows.

A good Luke tutorial can be found on LingPipe’s blog:
http://lingpipe-blog.com/2012/07/24/using-luke-the-lucene-index-browser-to-develop-search-queries/

Happy debugging!