Using Luke with ElasticSearch

If you’ve used Lucene or Solr before, you might be familiar with Luke, a Lucene tool for viewing and modifying indexes.  Luke was originally written by Andrzej Bialecki as a side project, and is an indispensable debugging tool for digging into the guts of an index (for example, to see exactly which tokens were stored on a particular field).

Luke overview

Luke’s Overview tab

 

Unfortunately, it hasn’t been particularly well maintained.  The official project at Google Code (link above) hasn’t been updated since Lucene 4.0.0-ALPHA (ca. 2012).  The good news is that there is some great community support for Luke at the moment:

Both should work out of the box with standard Lucene indexes, but you need to do a little extra work to read an ElasticSearch index.

 

If you try to open an index created by ElasticSearch with a stock copy of Luke, you’ll see the following error:

A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]

ElasticSearch uses a custom postings format (the postings format defines how the inverted index is represented in memory / on disk), and Luke doesn’t know about it. To tell Luke about the ES postings format, add the SPI class by following the steps below.

 

1. Clone Dmitry’s Mavenized repo:

$ git clone https://github.com/DmitryKey/luke/

 

2. Add a dependency on your required version of ElasticSearch to the Luke project’s pom file:

<!-- ElasticSearch -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>1.1.1</version>
</dependency>

 

3. Compile the Luke jar file (creates target/luke-with-deps.jar):

$ mvn package

 

4. Unpack Luke’s list of known postings formats to a temporary file:

$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/
Archive:  target/luke-with-deps.jar
  inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

5. Add the ElasticSearch postings formats to the temp file:

$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

6. Repack the modified file back into the jar:

$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

7. Run Luke

$ ./luke.sh

 

You can now open indexes created by ElasticSearch, search for, view, and edit documents, and all the other operations Luke allows.

A good Luke tutorial can be found on LingPipe’s blog:
http://lingpipe-blog.com/2012/07/24/using-luke-the-lucene-index-browser-to-develop-search-queries/

Happy debugging!

Advertisements

9 thoughts on “Using Luke with ElasticSearch

  1. Dmitry Kan

    Hello!

    Great post! If you feel like contributing this feature to luke, I would be glad to accept a pull request. I think the process of adding the services should be automatable via maven itself.

    Dmitry Kan
    luke maintainer

    Like

    Reply
    1. Ross Simpson Post author

      Thanks Dmitry, and thanks for keeping Luke up to date!

      I’m happy to submit a PR for this work. I haven’t done it yet for 2 reasons:

      * I didn’t think everyone would want to use Luke with ES, so didn’t want to add the dependency into the pom globally
      * My maven’s not strong enough to know how to programatically look through the unpacked dependencies for classes extending PostingsFormat.

      Any thoughts? I’ll look into it in a couple of weeks when I get back.

      Cheers!
      Ross

      Like

      Reply
  2. J. David Beutel

    Thanks for these instructions! I’d like to make a clarification:

    I had to put the elasticsearch dependency at the end of the dependencies. When I put it at the beginning, it overrode the META-INF/services files org.apache.lucene.codecs.Codec to be empty and org.apache.lucene.codecs.PostingsFormat to have just the elasticsearch classes.

    Cheers,
    11011011

    Like

    Reply
  3. Pingback: Flauschig Dev. Blog » Luke with Elasticsearch Index

  4. Markos Fragkakis

    Hi,

    In order to get it to work with an ElasticSearch 1.3.4 index (uses Lucene 4.9.1) I also had do add the following codec in the org.apache.lucene.codecs.Codec file:

    org.apache.lucene.codecs.lucene49.Lucene49Codec

    Like

    Reply
  5. Ashwin Jayaprakash

    Thanks for posting this! I think it would be good to also mention that the ElasticSearch JARs should be in the classpath for the modified JAR to work.

    java -XX:MaxPermSize=512m -cp lukeall-4.10.1-mod.jar:es-install-dir/lib/* org.getopt.luke.Luke

    Like

    Reply
  6. Shane

    Great instructions, very helpful! One clarification in case anyone else runs into it: trying to build luke with Maven 2 will product an error.


    [ERROR] BUILD ERROR
    [INFO] ------------------------------------------------------------------------
    [INFO] Error building POM (may not be this project's POM).

    Project ID: com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.2

    Reason: Cannot find parent: org.sonatype.oss:oss-parent for project: com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.2 for project c
    om.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.2

    The solution is simple: upgrade to Maven 3.

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s