2011-12-23 #opensuse-project [01:06] I lost you when you were saying that openfate sucks, right? [01:07] what happened you all left? [01:07] warlordfff: I dont think openfate sucks..I just think that its surplus to requirements - why have bugzilla + openfate especially when we dont actualyl rely on either to *define* our future features..just help guide things along [01:07] I think openFate is a great idea [01:07] but it might needs work [01:08] but sadly I don't have any ideas about it [01:08] so I stand in my corner on that [01:09] searchability sucks monumentally, unfortunately [01:09] try searching for "board 2012 elections platform pages" on the wiki [01:09] zilch [01:09] it's a good idea but end of the day from a contributors perspective, when someone things "I want to do something new for openSUSE"..where do they look? bugzilla? openfate? it would be much nicer to have a single spot where a contributor can look ..oh that needs doing, great [01:09] you can't even find any page about the 2012 elections with that using google [01:10] Ilmehtar: right [01:10] openfate is nice in its own right, but it's pointless to duplicate, the same can be achieved with bugzilla [01:10] also from a users perspective..they want a new feature..where do they file it? bugzilla as an enhancement? openfate as a new feature? [01:10] does bugzilla has RSS or something? [01:11] some [01:11] * Ilmehtar grumbles about that gnome3 ML troll [01:11] Ilmehtar: don't feed :) [01:11] yaloki: not any more..I tried to be nice, now I'm going to STFU and hope he does too [01:12] Ilmehtar: openFATE for features; although it is almost as complicated as bugilla [01:12] bugzilla also lets you follow others, so like the gnome team all follow the default assignee for gnome bugs so we all get emails about the changes to gnome bugs that aren't assigned to anyone specifically [01:12] oh you taslk about the guy who told that Gnome3 sucks on openSUSE? [01:12] simon321: where do features end, and enhancements begin? :) [01:12] Ilmehtar: it is the same backet [01:12] basket [01:14] warlordfff: and yes, I was, just saw his latest and decided to give up trying to use logic to have a meaningful discussion with the fella [01:14] in general with openSUSE web infrastructure is only one thing bad - it is not connected [01:14] simon321: and search [01:14] Ilmehtar: "the full name of "LINUX" is "Linux is not Unix" neither "Windows" or "iOS"" =EPIC [01:15] wiki search is the most ridiculous I've ever seen [01:15] there is lots of content you can't even find by searching [01:15] yaloki: if you have no idea how to connect things then nothing works as expected, including search [01:15] warlordfff: lol, yes, I have to admit his antics have given me quite a few laughs, but still, would be nice to actually have people constructivly critique g3 rather than just bitch about it [01:15] simon321: search is broken because of a bad decision [01:15] simon321: it is restricted intentionally [01:16] yaloki: why and how? [01:16] simon321: having a proper search engine across all parts of the infrastructure would be awesome, yes, but also pretty tough to implement [01:16] why don't we take baby steps and try to make Search work properly and then move to other stuff? [01:16] Ilmehtar: only the portal pages are searched by default [01:16] Ilmehtar: henne thought, still thinks, and insists that that is the right way [01:16] yaloki: I did not know that..glad I've started tuning the gnome portal page.. [01:17] Ilmehtar: he had the illusion that people would then be forced to put everything in portal pages [01:17] yaloki: apropos wiki search, we can let everyone to see everything - if henne let that happen :) , but we still need presentation space separated from support [01:17] yaloki: it would have a hope of working, if people knew about it.. [01:17] simon321: please define "presentation space" and "support" [01:17] simon321: for search, you don't separate: show everything you can, with good relevance ranking :) [01:18] Ilmehtar: even then it's stupid imho: if content is there, let people find it and use it, nevermind how it's structured [01:18] yaloki: how to do that - where is relevance written in the wiki [01:18] guys at the oSC11 Henne admited that he needs hands to organize the wiki [01:18] Ilmehtar: it's not like there are 10 people working full time on maintaining content and structure there [01:18] simon321: in the search engine implementation [01:19] simon321: mehle moved from the default search+indexing engine to lucene [01:19] simon321: I don't know whether the relevance scoring is good or not, that depends on how you tune lucene [01:19] you can influence which fields get higher scores for matches, etc... [01:20] * yaloki has done quite a lot with Solr, which is a layer on top of Lucene [01:20] including on a mediawiki [01:20] yaloki: on the other side, what is relevant for one is not for the other see http://en.opensuse.org/Portal:Wiki#Structure [01:20] the perfect solution[tm] would probably be to add some "bonus points" to search results in the main and portal namespace so that they always appear as top search result, and the other (now usually hidden) results could follow below [01:21] simon321: partly true, but it can usually be done pretty well, when you know how the implementation works, and when you spend some time tuning it [01:21] cboltz: yes, that can be done easily [01:21] lucene (and solr) has "boosting" [01:21] and a big solr instance could be deployed to index all the parts of the infrastructure too [01:22] yaloki: I can't know who is the visitor, it is up to him/her to select what part he wants to see [01:22] lists, forums, wiki, openfate, bugzilla, ? [01:22] but it would require implementation efforts for each part [01:22] simon321: no, why? [01:22] simon321: if you search for "nvidia" [01:23] simon321: just give all the pages that mention nvidia, with higher scoring for pages that have nvidia in their title, and pages that mention it more frequently, etc... [01:23] (that's what such proper search engines do, unlike just using mysql for search) [01:23] yaloki: 1) is nvidia supported, 2) do I have problem with nvidia, wich one is visitor looking for? [01:24] simon321: no, that's not how search works [01:24] simon321: it's search, not support on irc :) [01:24] simon321: search relevance obviously cannot be subjective [01:24] simon321: it's search relevance through scoring and boosting in the search engine implementation [01:25] yaloki: to start with small steps - how exactly can we implement boosting in the wiki search? [01:26] cboltz: through the configuration [01:26] (I don't say your "big" solution is wrong, however it will need time. And until then, small steps are better than nothing ;-) [01:26] cboltz: no it doesn't need time [01:26] yaloki: I have no idea what guys want to see, and pretending that one word reveals that, is being a bit too confident - some words will actually tell, as "nvidia", other as may be too ambitious [01:26] cboltz: can be done in an hour [01:26] simon321: sure, but there are techniques for that too [01:26] simon321: we're not the first ones to implement search :) [01:26] "configuration" - yes of course ;-) [01:27] stopwords, etc.. [01:27] any pointers about details? ;-) [01:27] cboltz: I don't know how the lucene integration in mediawiki works [01:27] cboltz: I've implemented it from scratch with Solr in half a day [01:27] cboltz: and there you can configure the boosting in the Solr configuration file [01:27] yaloki: I know only one that works well, disambiguation pages (but they are not listed as offical navigation tool) [01:27] cboltz: and, of course, stop only giving results from portal pages [01:28] when creating the search index or when the search query runs? [01:28] simon321: no, really, it works well [01:28] cboltz: you can do both [01:28] cboltz: and then it also depends on what is actually being indexed [01:28] cboltz: the implementation I did also indexes the wiki categories the page is on [01:28] cboltz: and boosts higher on category word matches [01:29] type KDE, it is a bit funny since it does not gives you the pages found but get's you straight to the KDE page [01:29] you can also tune on nearest matches [01:29] the index probably contains all wiki pages - at least searching with "all: $searchword" works [01:29] warlordfff: yeah, that's wrong too imho [01:29] yeap [01:29] also [01:30] actually [01:30] if you are searching e.g from Greece it should get you to the Greek page if available [01:30] when you think of how to implement search [01:30] it's *very* simple [01:30] it must be like google [01:30] period [01:30] that's what people expect [01:30] it should be possible to change that behaviour for "KDE" - probably an easy fix in the search form [01:30] give a list of results, with relevance [01:30] yaloki: going straight to the page is how wikipedia works - but for terms like KDE they offer listing of topics - not presentation [01:31] simon321: yes, but that's not what people expect [01:31] as said, it must behave like google search [01:31] that's what 99% of people use 99% of the time ^^ [01:31] well, I do, just because it works fine on wikipedia :) [01:31] simon321: it works fine on wikipedia because they have a very stringent design of the page names and disambiguation [01:32] simon321: which we don't, and never will have, simply because it would require a lot more people to do some caretaking of the wiki [01:33] the care taking is a problem [01:33] Henne told us in that BoF that there are only 5-6 people doing that [01:34] so he is the last I would blame on that [01:34] cboltz: I actually proposed to matthew that I give him my code for solr search and explain to him how it works etc.. [01:34] cboltz: but then there was no followup and suddenly he did something, no one knows what, with lucene [01:34] you can't expect people to contribute when things are done like that [01:35] indeed, that's understandable :-/ [01:36] cboltz: so I can't tell you what needs to be configured where, never used the mediawiki lucene plugin (as I guess there is such a thing and that's what has been used) [01:36] cboltz: that being said, lucene also has boosting and such, as Solr builds upon that [01:36] well, last time there was discussion, he got working lucene, and not that much time to spend on alternatives [01:36] cboltz: and well it just takes a bit of tuning, trial and error [01:36] cboltz: think of a few searches and what you think would make sense as results [01:36] cboltz: then try, analyze and tune until you get that result [01:37] (solr has a good analyzer too) [01:37] yes, there is a MW extension (and I'm also using it in a wiki I maintain) [01:37] simon321: I offered him my help [01:37] but IIRC I didn't see anything about boosting in its documentation [01:37] simon321: I know exactly how to do it, I've done that for a mediawiki already, including the implementation of live indexing, and tuning the search results, etc... [01:37] simon321: :\ [01:38] cboltz: it's probably hard-coded in the extension then [01:38] cboltz: and probably with pretty sane defaults [01:38] cboltz: but we might want to tune it too [01:38] yaloki: you know that he wasn't there for some time, and I don't think that was his decision [01:38] cboltz: e.g. boost results with portal or SDB higher than other ones [01:39] simon321: I have no idea, and that's what I'm criticizing, it was totally opaque :\ [01:39] yaloki: to me it appeared like someone else decision [01:39] simon321: no idea, no one knows [01:40] simon321: but if people there feel like they should just take the decision and do it on their own without discussing, then it's an issue [01:40] simon321: it's even ridiculous because people with relevant skills are in our community and ready to help [01:40] simon321: (not just talking about this case) [01:41] simon321: don't get me wrong, I'm not hitting on matthew or anyone else [01:41] yaloki: I know [01:41] ok, but explaining would be nice, right? [01:41] and we all make mistakes, obviously, and it's human to just go to the next office and discuss it there rather than going the full loop through mailing-lists [01:41] but still [01:42] if we want contributors, if we want a project and a community [01:42] then it's not acceptable [01:42] yaloki: in case you are interested, it's this extension: http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/MWSearch/ [01:42] yaloki: matthew appearance was the best thing that happened to openSUSE web - just as disappearance was bad [01:43] when that guy is around things are going on, otherwise stagnate [01:44] yes, he really does a good job - even if he forgets to re-apply my patch for MultiBoilerplate on each update ;-) [01:44] and taking decision and pushing too strong in one direction is not going to work; I have to agree with that [01:45] but fortunately I can fix that myself now ;-) and just need to ask for deployment [01:45] cboltz: oh wow they don't seem to do any boosting at all in the search.. couldn't see anything in the indexing stage either, at first [01:45] taking decision without any consultation (I meant) [01:45] yaloki: I'm not sure how the indexing is done [01:45] simon321: yes sure, it's a balance [01:45] cboltz: well I'm a bit surprised because I don't see that they're using the mediawiki hooks to do live indexing ? [01:46] yaloki: btw, search indexer was broken few months [01:46] there is a PHP file for it in the extension, but in my wiki I'm just running a daily indexing cronjob [01:46] (yes, we already discussed that some weeks ago...) [01:46] cboltz: I think it works offline in batch [01:46] ouch [01:47] ok, that's prolly needed for something of the size of wikipedia [01:47] but on opensuse.org we could really do live indexing [01:47] (as soon as a page is created, modified or updated, the search index is updated) [01:47] yes, if they give enough cycles to server :) [01:48] that would be good, yes - but OTOH getting good search results (even if the result is slightly outdated) is the more important thing ;-) [01:48] when we have that, we can start to think about live indexing [01:48] cboltz: no, there is no boosting at all [01:48] that's ridiculous :( [01:49] simon321: lucene and solr are extremely fast [01:49] cboltz: well, live indexing is actually a lot easier to implement [01:49] cboltz: my extension that does that is pretty small, definitely a lot smaller than MWSearch [01:50] I mean, you should at the very least boost the title field [01:50] yaloki: do you have something that works on current MW used on openSUSE wikis [01:51] obviously ;-) - but not only in the openSUSE wiki, it would be good for all wikis using MWSearch (in other words: upstream) [01:51] cboltz: https://wiki.apache.org/solr/SolrRelevancyCookbook [01:51] simon321: it would need specific tuning but yes [01:51] simon321: that's what I told matthew ages ago [01:52] so you or cboltz can upload that to git and have ready for matt to deploy? [01:52] no [01:52] it needs more work than that [01:52] it needs a Solr instance, to start with [01:52] and that won't work because the admins won't install it if there is no RPM of it ¬¬ [01:53] MW has no rpm [01:53] just like we don't have our own etherpad instance for the same reasons [01:54] simon321: you are wrong - AFAIK there is a MW rpm in openSUSE ;-) [01:54] (but without extensions etc.) [01:54] cboltz: and what is deployed is what? [01:54] simon321: but it prolly wouldn't work for other reasons, like me needing access to some stuff on the mediawiki server (or a staging instance) [01:55] simon321: not a RPM - everything is "collected" in a git repo [01:55] and based on tarballs and svn checkouts [01:55] cboltz: hmmm [01:55] cboltz: on github? [01:55] cboltz: well, that is what I meant - rpm :) [01:55] - is a minus [01:56] yaloki: https://github.com/openSUSE/wiki [01:56] cboltz: ok thanks [01:57] simon321: a RPM won't really work - you still have to maintain extensions (well, could be another RPM for each extension), the config file etc. [01:57] and to make things worse, we need a small modification in a MW core file... [01:58] cboltz: you know that rpm for single installation is just the way around without purpose :) [01:58] well, it's not needed [01:58] you already have git for versioning etc.. [01:58] oh it uses geshi [01:59] * yaloki also wrote a plugin + a php module to use a shlib as syntax highlighter [01:59] faster :) [01:59] it's upstream btw [02:01] anywayz [02:01] time for me to collect a few bits of sleep [02:01] n8 folks [02:01] Going to sleep, goodnight guys [02:01] BB [02:01] let's revive that discussion later [02:02] maybe in a project meeting [02:02] you people are still yapping??? [02:02] :) [02:02] that makes two good ideas (going to bed and continueing the discussion) ;-) [02:02] oh suseROCKs is here, we're late :D [02:02] somehow it is more reassuring when guy says he's late than when a gal says she's late... [02:03] goodnight people, although from a part and further it was impossible for me to follow ,I learned a few stuff :D [02:03] warlordfff, did you learn how to chew gum and walk at the same time? [02:03] Niarfff [02:04] Goodnight