I have installed both Sphinx and Lucene on the staging sites. They each have their advantages and disadvantages, so I want to see what everyone would like to see go to production.
I know that sphinx might be tempting right now but i would definitely go with lucene. Because we can then live off the development and maintenance of mediawiki instead of doing our own. I also think we should use the same style as the mediawiki search page because people are used to it.
Yes, I agree that that the search integration bothers me too. If Sphinx had the same quality of plugin as MWSearch, then I would be much more convinced that it is better choice for us. As it is right now, I suppose it's something of a coin toss. One option is to rework the plugin to integrate with the default search functions instead of working around them. This shouldn't be as hard as it sounds. The vast majority of this would be in a single PHP file, and I don't think it would take more than a couple of hours. I could do it myself if I wasn't so swamped by other requests. The default search and both search extensions are hardly ever updated, there wouldn't be much work once this is done. Anyways, it's just a thought. One of my main concerns for Lucene are that the indexing process is very heavy, so I wouldn't want to run it more than once or maybe twice a day, compared to every few minutes for Sphinx. There is a way to do incremental updates, but it involves yet another extension and schema change, and there have been numerous problems reported with it. The other big concern is that I cannot get suggestions to work at all under Lucene. The spelling index gets created, but I cannot get the search daemon to use it. I'm going to try compiling the Java source under the system's JDK and Ant to see if that gets me anywhere. If I can get the suggestion functionality to work, then I will agree that Lucene is absolutely the superior choice (their suggestion functionality beats aspell/pspell hands down).