Tag Archives: debugging

Using Luke with ElasticSearch

If you’ve used Lucene or Solr before, you might be familiar with Luke, a Lucene tool for viewing and modifying indexes.  Luke was originally written by Andrzej Bialecki as a side project, and is an indispensable debugging tool for digging into the guts of an index (for example, to see exactly which tokens were stored on a particular field).

Luke overview

Luke’s Overview tab

 

Unfortunately, it hasn’t been particularly well maintained.  The official project at Google Code (link above) hasn’t been updated since Lucene 4.0.0-ALPHA (ca. 2012).  The good news is that there is some great community support for Luke at the moment:

Both should work out of the box with standard Lucene indexes, but you need to do a little extra work to read an ElasticSearch index.

 

If you try to open an index created by ElasticSearch with a stock copy of Luke, you’ll see the following error:

A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]

ElasticSearch uses a custom postings format (the postings format defines how the inverted index is represented in memory / on disk), and Luke doesn’t know about it. To tell Luke about the ES postings format, add the SPI class by following the steps below.

 

1. Clone Dmitry’s Mavenized repo:

$ git clone https://github.com/DmitryKey/luke/

 

2. Add a dependency on your required version of ElasticSearch to the Luke project’s pom file:

<!-- ElasticSearch -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>1.1.1</version>
</dependency>

 

3. Compile the Luke jar file (creates target/luke-with-deps.jar):

$ mvn package

 

4. Unpack Luke’s list of known postings formats to a temporary file:

$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/
Archive:  target/luke-with-deps.jar
  inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

5. Add the ElasticSearch postings formats to the temp file:

$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" 
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

6. Repack the modified file back into the jar:

$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat

 

7. Run Luke

$ ./luke.sh

 

You can now open indexes created by ElasticSearch, search for, view, and edit documents, and all the other operations Luke allows.

A good Luke tutorial can be found on LingPipe’s blog:
http://lingpipe-blog.com/2012/07/24/using-luke-the-lucene-index-browser-to-develop-search-queries/

Happy debugging!

Essential tools for web developers

I’m a firm believer in the principle of using the right tool for the job.  It applies equally well when building software as it does building a house or fixing a car.  There are literally thousands of tools available to do the job, and it can be hard finding the right one.  This post focuses on web development tools, specifically Firefox extensions.

  1. Firefox: while it’s not an extension, the browser itself deserves mention.  Primary web development should happen in a standards-compliant environment, and Firefox is a great platform for that.  There are certainly other browsers that fit the bill (Safari and Opera are candidates), but neither has the wealth of extensions that Firefox offers, nor the developer community behind them.

  2. Firebug: the tagline at the Firebug site is “web development evolved”, and that is a true statement.  Firebug may well be the largest single innovation in web development, ever.  That’s a bold statement, but it provides so much functionality, you hardly need other tools.  Some of it’s best features include:
    • Visual DOM exploration: mouse over nodes in the DOM tree, see them highlighted in the browser window.  Conversely, inspect an element in the browser window and access it in the DOM tree
    • Realtime CSS editing: changes to styles take effect immediately
    • Javascript console: execute arbitrary javascript in the context of the page
    • Network monitor: see full details on every request and response the browser makes, as they happen
    • Javascript debugging: a full-featured debugger and profiler for javascript code
    • Realtime reporting of Javascript and CSS errors: you’ll know when something’s not right

    If you don’t have Firebug yet, go get it.  Now.

  3. HTML Validator: invalid markup is a web developer’s nightmare — if your markup isn’t right, how can anything else be expected to function or look right?  Yet, unless you’re extremely well disciplined (and even then, sometimes), you’ll make mistakes.  This extension adds on-the-fly validation to Firefox, letting you know right away when there’s an error, indicating where it is, and even offering suggestions on how to fix it.  The validator has multiple validation engines; for best effect choose the serial algorithm (w3C’s SGML parser first, then HTML Tidy).  There’s no longer an excuse for invalid markup.

  4. Web Developer Toolbar: Chris Pederick’s toolbar was one of the first Firefox extensions targeted at web developers, and it still offers plenty of functionality.  Besides offering shortcuts to oft-used Firefox features, it works with cookies, forms, window resizing, HTML validation and more.  It also provides lots of information about the page and it’s elements.

  5. Tamper Data: when you need detailed information about HTTP requests and responses, the Net tab in Firebug is one place to look.  Another is Tamper Data, which provides an easily-filtered interface for inspecting HTTP traffic.  However, Tamper Data also allows the user to “tamper” with the request before it’s dispatched to the server, an incredibly useful trick when debugging misbehaving web applications.

Using the right tools will not only make your job easier, it’ll make you a better developer.  These tools can alert you to mistakes in your code, and you’ll learn to avoid repeating them in the future.

Everyone has their own favorite tools in their toolbox.  These are my favorites, and I ones I believe no professional web developer should be without.

Introducing Omnibug

Omnibug is a tool for web developers.  I wrote it because debugging an Omniture implementation is… painful.

The idea is that web metrics (or webanalytics, if you’re so inclined) systems generally make an HTTP request (usually an image) in order to pass along tracking information.  The URLs contain lots of parameters, conveniently URL-encoded so you can’t read them easily.

Omnibug is an extension to Firebug (without a doubt the best Firefox add-on available, driving a revolution in web UI development).  It adds a new panel with the decoded output of each such HTTP request, making it a breeze to see exactly what values were sent.

Though it was designed with Omniture in mind, it will work with other systems (also tested with Moniforce).  The patterns it looks for are fully configurable, so in theory it should work with any similar system.

An additional feature is the ability to log matching requests to the local filesystem.  While this feature was intended to support automated testing of metrics implementations, it may have other uses.

See the Omnibug page for downloads and full documentation.