There’s no denying that Google is King of the Hill right now when it comes to enterprise search, and it has been for quite some time now. There are other search engines that can handle the big data wave, but they remain a distant second, with Google enjoying a competition-less environment – but that may all change soon, according to Paul Doscher, CEO of Lucid Imagination. A new contender may appear in the form of Lucene.
At Lucid Imagination’s recent Lucene Conference in Cambridge, Mass., Doscher has revealed an application development stack that brings together the features of Mahour, R, Hadoop, and Lucene/Solr for machine learning, recommendation engines, analytics, and handling enterprise search. Dubbed “LucidWorks Big Data”, the stack helps make the deployment of enterprise-scale search faster and easier. The LucidWorks Big Data stack has entered beta stage as of time of writing.
According to Doscher, the Lucidworks Big Data is made available via APIs, which allows devs to use their own UI and algorithms in order to get productive much faster. This is compared to most instances of Hadoop, which are neither scalable nor repeatable.
Lucid Imagination’s main product, Lucene, is already being used by tons of e-commerce sites for handling customer searches. For example, the large online shoe store Zappos is currently using Lucene, which is interesting because retail giant Amazon, which acquired Zappos three years ago, is currently in the process of replacing Lucene with its own A9 Cloudsearch service. Other longtime users of Lucene include EMC, which used Lucene as a replacement for Microsoft’s FAST search tech when it comes to EMC’s document management system.
The need for the right search technology
Now that structured and unstructured data are fast becoming the norm, the need to search and index data efficiently will only be more urgent. This is why the market is full of competing technologies, with Lucene and its core engine, as well as the more developer-friendly SOLr trying to get in on the market share of Google Search Appliance and HP’s Autonomy, which Doscher himself has grudgingly praised as the “800 pound gorilla”. There’s also Microsoft’s FAST search, and the newly minted A9 from Amazon.
Right now, Lucid is one of the fan favorites, although rival Elasticsearch is also gaining some ground, according to consulting firm Search Technologies. According to some attendees of the Lucene Revolution Conference, particularly one from the MD Anderson Cancer Research Center, there is no alternative to Lucene for their industry, as the tech is able to handle all pertinent data such as text, images, structure, and unstructured without choking, unlike the competitors. Which can be a very big deal in an industry that saves lives.
Pedantic point but Lucene is surely an Apache product…? Lucid utilizes this open source project.
Lucene is a project managed by the non-profit Apache Foundation, as is Solr, a search server built on top of Lucene. Lucid produce their own packaged versions of Lucene/Solr and can provide support, training etc. (disclaimer: we’re a Lucid partner). Many others use Lucene/Solr directly as open source software. Google’s enterprise search product, the Google Search Appliance, certainly doesn’t enjoy a competition-less environment but is popular as an install-and-forget solution – but it’s pricey for large document collections.
To add to what Charlie and Alan have noted, Lucene is hardly a “new” contender. It was originally written in 1999, and by the time Google launched its first appliance in 2002, Lucene was already a top-level Apache project.