Lucene in Action, Second Edition: Covers Apache Lucene 3.0


Author(s): Michael McCandless, Erik Hatcher, and Otis Gospodnetić
ISBN: 1933988177
Published: July, 2010
Relevance: 5/5
Readability: 4/5
Overall: 5/5

More Manning Books Reviews …

In these days searching performance is crucial when we work with a huge amount of data available in many enterprise business. This is a hard work, Lucene is here to help us. This is an interesting book, some times you need read many times to understand complex topics

You have more of 450 pages available to learn Lucene

Each chapter include many, many sections!. I going to do mention only for those where they are covered in many pages, no the shortest. Therefore next my summary overview for this interesting work

Since this book includes a lot of sections for each chapter, it has two side effects, in the bottom of my review is expanded my opinion about this. Many of these sections are based with images source code and explanation, some of them are long and other practically concrete, therefore you going read or find this pattern or approach many times below

Part I: Core Lucene

Chapter 01 Meet Lucene

A solid chapter, introducing about the information explosion for these days and then introducing Lucene, explaining what is and what can do, even including the history about its creation. A valuable image about many components involved for the search application is included, even more, long and important explanation for these components is available too.

A sample application with its respective explanation, instructions and result output are shown too. Excellent and a long explanation for core indexing classes and core searching classes are available too.

Chapter 02 Building a search index

Starting and explaining about the indexing process with important material, a sample source code is included with its respective explanation, delete and update methods API are introcuded and explained.

Fields option are well covered, valuable information available, the rest of the chapter is long and based practically only with theory covering like Boosting documents and fields, indexing numbers, dates, times and concurrency, thread safety and locking issues

Chapter 03: Adding search to your application

After of a concrete introduction about the searching API a short sample source code for a TermQuery is introduced and explained, same appreciation for QueryParser even including an image to represent its work.

Covering for IndexSearcher is available too, even including almost a page of source code about Near-ral-time search with its respective explanation, interesting this. Same appreciation about Lucene scoring.

Other section available is Lucene’s diverse queries, where topics such as TermQuery, TermRangeQuery, NumericRangeQuery, PrefixQuery, BooleanQuery, PhraseQuery, WildcardQuery and FuzzyQuery are available through a good amount of pages including important source code with its respective explanation, realize yourself Lucene offer a good support about Query

Chapter 04: Lucene’s analysis process

Starting with an image and explaining about Analysis process during indexing, following with What is inside an analyzer, where important terms like token and token stream are explained with valuable theory and important images for a better explanation.

Among other sections, Synonyms and aliases is covered, important and valuable source code with its respective explanation is available.

Something very crucial is the section Languages analysis issues, well covered.

Chapter 05: Advanced search techniques

A long chapter, after of a concrete introduction for Lucene’s field cache, we have an important covering for Sorting search results, many sorting options represented through source code with its respective explanation and output results are available, many pages used.

Same appreciation for Span queries about its long covering and variations, where each variation include an image for a better understanding. Again same approach used to Filtering a search, well covered.

Chapter 06: Extending search

Starting quickly with a situation about a geographic sorting covered practically with three pages of source code with its respective explanation. Same appreciation about custom Collector, two approaches has been used.

A deeper and long cover about QueryParser is available, based with many samples about source code with its respective explanation, practically covered in five pages, even including a table about its extensibiltiy points. Again same approach used for Filters and Payloads.

Really an interesting chapter with a lot of source code available

Part II: Applied Lucene

Chapter 07: Extracting text with Tika

Starting quickly about an introduction for Tika, including a table of practically two pages about documents format supported to parse, explanation about its API and how install it. Therefore how extract text programmatically is covered with two pages of source code with its respective explanation. No everything is perfect, limitations about Tika is covered too.

To complete the chapter, material covering about indexing custom XML is available, working with SAX and Apache Commons Digester, each one include its own sample source code with its respective explanation

Chapter 08: Essential Lucene extensions

This chapter is based closely about Luke, many images about its environment and explanation of its features is available. I mean images about, tabs overviews, Documents tabs, search for QueryParser, Files support, etc.

Something important and valuable is a table available in two pages about API for Analyzers, Tokenizers and TokenFilters, very interesting this table.

An important section is Highlighting query terms, an image about the flow process and the classes and interfaces involved is shown, explanation for each component is included. Sample source code to work and apply highlighting is available, even working with CSS.

Even more, how to work with Spell checking is covered through source code with its respective explanation. Something valuable is practically a page about ideas to improve spell checking.

To complete the chapter, many Query extensions are introduced like MoreLikeThis, RegexQuery and more

Chapter 09: Further Lucene extensions

Starting quickly covering Chaining Filters based practically in three pages of source code with its respective explanation. An interesting section based in the same approach used before is about Storing an index in Berkley DB.

An interesting section is about Synonyms working with WordNet, how to build an index and how to work with an analyzer is well covered through images, source code and explanation, the images are a good complement.

A valuable section is about the XML QueryParser, where an interesting image about the three commons options for building a Lucene from a search UI is available, valuable source code and explanation detailed for a .xsd code is included too.

Spatial Lucene is included too, based with important images about Globe, Tiers and Grid Boxes, of course, respective source code with its respective explanation is well introduced covering important topics such as searching and perfomance.

Practically to complete the chapter a well covered section available is Searching multiple Indexes remotely, explained with an image and important and valuable source code with a concrete explanation.

Chapter 10: Using Lucene from other programming languages

An interesting chapter for our consideration, practically is based in many sample source code about how you can work with Lucene with others programming languages, these covered are:

  • CLucene (C++)
  • Lucene.Net (C#)
  • KinoSearch and Lucy (Perl)
  • Ferret (Ruby)
  • PyLucene (Python)

Chapter 11: Lucene administration and performance tuning

This chapter is not neither very long nor very short, but is concrete, valuable theory and explanation about source code is available, covering among many things topics like: Tuning, Threads, managing disk memory usage, index.

Some images about some flow process and output perfomance are available too to complement the long theory offered for many sections covered

Part III: Case studies

Practically we have three very interesting chapters, these has common feautres like considerable theory and explanation about each situation or case, some snippet code to complement some ideas, valuable images about some simple and complex process, some view or output results and finally some JMX configurations.

These three finals chapters are:

  • Chapter 12: Case study 1: Krugle
  • Chapter 13: Case study 2: SIREn
  • Chapter 14: Case study 3: LinkedIn

What I liked:

  • A lot of sections practically exists for each chapter, therefore expanded covering about Lucene you got it
  • A lot of theory available
  • Many tables for a better complement are available
  • Valuables images to understand complex process and functions are available

What I disliked:

  • A lot of sections practically exists for each chapter, but many of these sections are only based in theory, therefore you only get the idea but not the action
  • you need read many times a chapter due the previous point, a lot of topics to learn
  • Many times about the sample source code, I felt the impression of a deeper explanation of the code.

I hope dont see strong API changes with the Apache Lucene 3.3.0 against the actual version covered, it is 3.0


More Manning Books Reviews …


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s