(This article is the second part in a series. Read Part 1)
Having used Sphinx for a while, we found there was still room for improvement. Our new search engine worked but wasn't as quick as we would have liked under load. Initially the index was built along a one-index-for-all approach, containing every possible field that we may or may not have wanted to search on. Its primary use was to return searches to the main application, however there were also utility applications that needed search such as our newsletter tool.
The change we made to improve performance was fairly simple. We analysed which attributes we were actually querying against and removed all the attributes that were unused. This in itself would have probably been enough, however it still left quite a large index in terms of how many columns were in the query and led us to the next step of splitting out indexes. We tried using type based indexes, for example:
This new approach had a number of significant benefits:
- The individual index size was reduced as unnecessary data was no longer stored
- The index rebuild time was reduced
- We gained the ability to take specific indexes offline for maintenance
- The end-to-end search time itself was reduced
Next time: using distributed indexes with Sphinx.