Full Text Search with Sphinx

I’ve been working on a project that involves full text search within a collection of documents. Search is a pretty common problem so I knew there must be well established solutions available. Sphinx is one such solution, and was our choice for implementation. It’s open source, fast, and very easy to use.

Setting up a simple search instance is straightforward. There are really only four steps:

  • Install Sphinx
  • Put the text you wish to search in a database (we use mySQL; other options are available)
  • Set up a configuration file to point to your database. An example file is provided and you can get up and running by modifying just a few fields
  • Run the indexer to create the search index

And thats it. At this point you can start the Sphinx search daemon and query it using one of several methods. We are using the included php library to implement a web-based search.

Even though setup is simple, there are a huge number of options and features that can be used to customize search to suit your specific needs. One example that we use is the excerpt builder. This provides a method to generate excerpts from the text containing the search terms, like you would see in Google search results.

There are a lot of alternatives out there, but Sphinx has provided everything we need, along with ease of use, so we haven’t even felt the need to look at those alternatives. Definitely recommended, and we will continue using it if opportunities present themselves in the future.

  • Erik Kispert

    How might faculty and staff (and possibly students) utilize this here at Valpo? I mean is this something we might be able to use in our day-to-day activities? I’m just struggling to think of applications for this technology because I don’t know your systems well enough.

    • Jim Crowley

      The reason we have been looking at it in the first place is to develop a custom search engine for the law school, so that will be an opportunity for faculty, staff, and students to use this technology once its deployed.

      In general though its more of a tool for us to utilize on the backend, and it’s presence will always be largely transparent to the end users. We don’t have any plans right now to use it in any specific upcoming projects, but you never know when another opportunity might present itself.