Hello, I'm starting a new thread to make sure everybody who is interested notices it ;-) You all know that we get quite some complaints (and/or complain ourself) about our wiki search because "it doesn't find the stuff". This is mostly caused by the fact that by default the search only lists pages in the main and Portal namespace. I propose to change the wiki search and the config of Lucene (the search engine used in the openSUSE wikis) to - search all (relevant) namespaces - use [Namespace-Boost] in lsearch-global.conf to boost the namespaces that we currently search (main and portal) The result will be that pages from main and portal namespace will still be the top-rated search results, with the (currently hidden) results from other namespaces below. Here's a list of the namespaces we use, together with the boost value I propose (1 means top rating, 0.0001 means it will be the last search result, - means not to include this namespace in the default search [1]) Boost Namespace number/name 1 0 Main 0.0005 1 Talk 0.005 2 User 0.001 3 User talk 0.6 4 Project ("openSUSE") 0.0005 5 Project talk 0.02 6 File 0.0005 7 File talk 0.005 8 MediaWiki - 9 MediaWiki talk 0.0005 10 Template - 11 Template talk 0.01 12 Help 0.0005 13 Help talk 0.02 14 Category 0.0005 15 Category talk 0.6 100 SDB 0.0005 101 SDB talk 1 102 Portal 0.0005 103 Portal talk 0.2 104 Archive 0.0005 105 Archive talk 0.2 106 HCL 0.0005 107 HCL talk - 110 Book - 111 Book talk - 122 Property - 123 Property talk - 126 Form - 127 Form talk - 128 Concept - 129 Concept talk - 420 Layer - 421 Layer talk Just in case you wonder about the numbers - they are based on what Wikipedia uses AFAIK: (0, 1) (1, 0.0005) (2, 0.005) (3, 0.001) (4, 0.01), (6, 0.02), (8, 0.005), (10, 0.0005), (12, 0.01), (14, 0.02) and adjusted to our needs (see for example the difference in the project namespace) I'd like to know - if you like this proposal in general, - if you agree with the list of namespaces to search by default, - if we want to include the talk pages in the default search and - if you agree with the boost value for each namespace In the meantime, I'll fetch some cable ties to tie up Henne - I'm sure he won't like the idea of including all those namespaces in the default search ;-)) Needless to say that we might need to finetune the boost values of some namespaces later (needs some testing with the real wiki content), but in general I think the above values will work much better than what we have now ;-) Regards, Christian Boltz [1] Nevertheless, we should set those namespaces to a boost value of 0.0001 in case someone includes them in the search --
Status? NEW [Ihno Krumreich and Stephan Kulow on https://bugzilla.novell.com/show_bug.cgi?id=159223]
-- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
Hello Christian, I work in the dewiki. I need the search function every day. And often I wondered about the results. Now I see the reason. And I would agree with some changes: Am Montag, 2. Dezember 2013, 21:52:51 schrieb Christian Boltz:
Hello,
I'm starting a new thread to make sure everybody who is interested notices it ;-)
You all know that we get quite some complaints (and/or complain ourself) about our wiki search because "it doesn't find the stuff".
This is mostly caused by the fact that by default the search only lists pages in the main and Portal namespace.
I propose to change the wiki search and the config of Lucene (the search engine used in the openSUSE wikis) to - search all (relevant) namespaces - use [Namespace-Boost] in lsearch-global.conf to boost the namespaces that we currently search (main and portal)
The result will be that pages from main and portal namespace will still be the top-rated search results, with the (currently hidden) results from other namespaces below.
Here's a list of the namespaces we use, together with the boost value I propose (1 means top rating, 0.0001 means it will be the last search result, - means not to include this namespace in the default search [1])
Boost Namespace number/name 1 0 Main 0.0005 1 Talk 0.005 2 User 0.001 3 User talk 0.6 4 Project ("openSUSE") 0.0005 5 Project talk 0.02 6 File 0.0005 7 File talk 0.005 8 MediaWiki - 9 MediaWiki talk 0.0005 10 Template - 11 Template talk 0.01 12 Help 0.0005 13 Help talk 0.02 14 Category 0.0005 15 Category talk 0.6 100 SDB 0.0005 101 SDB talk 1 102 Portal 0.0005 103 Portal talk 0.2 104 Archive 0.0005 105 Archive talk 0.2 106 HCL 0.0005 107 HCL talk - 110 Book - 111 Book talk - 122 Property - 123 Property talk - 126 Form - 127 Form talk - 128 Concept - 129 Concept talk - 420 Layer - 421 Layer talk
Just in case you wonder about the numbers - they are based on what Wikipedia uses AFAIK: (0, 1) (1, 0.0005) (2, 0.005) (3, 0.001) (4, 0.01), (6, 0.02), (8, 0.005), (10, 0.0005), (12, 0.01), (14, 0.02) and adjusted to our needs (see for example the difference in the project namespace)
I'd like to know - if you like this proposal in general,
yes
- if you agree with the list of namespaces to search by default,
yes
- if we want to include the talk pages in the default search and
+/- I think this is not so important.
- if you agree with the boost value for each namespace
The number system you mentioned above is a little to cryptical for me. "Say" what is first, second and so on in the result list. Best Regards Wolfgang openSUSE Member DE-Wiki-Team -- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
Hello, Am Dienstag, 3. Dezember 2013 schrieb Wolfgang Hahnl:
Am Montag, 2. Dezember 2013, 21:52:51 schrieb Christian Boltz:
- if you agree with the boost value for each namespace
The number system you mentioned above is a little to cryptical for me. "Say" what is first, second and so on in the result list.
Well, it depends on how good the page matches the search term ;-) Think of the boost value as a "weight" (not as a "first", "second", ...) that is part of the ranking calculation. The other part in the calculation is how good the page content matches the search term. Let me give you an example: Search for "download openSUSE distribution" [1] Let's assume there are two pages with a 100% text match: - /Download - /Archive:Download_12.2 Both search results are equal, now the boost value comes into play, and you'll get the following search result order: - /Download: 100% match * 1 = 100 "points" - /Release_Notes: 70% match * 1 = 70 "points" - openSUSE:Download_statistics: 80% match * 0.6 = 48 "points" - /Archive:Download_12.2: 100% match * 0.2 = 20 "points" So in the end /Download is the first search result (100 "points"), followed by some other pages (which contain only two of the three words you searched) and, much more downwards, /Archive:Download_12.2 The result can also be the other way round - a 100% match in the SDB namespace (= 60 "points") can be the top result if there's only a 50% match in the main namespace (= 50 "points"). Nevertheless, it will be quite rare that a page in the Talk namespace ends up as first search result because it can get a maximum of 100% match * 0.0005 = 0.05 "points". Does this explain the boost values? Regards, Christian Boltz [1] I just invented this search and its results, which means the "real" search will most probably give you different results and/or different page content. -- Wenn es jemand gibt, der Facebook derzeit noch stoppen kann, dann wohl Google. Regentraufen- und Pestcholera-Vergleich bitte hier einfügen. [http://praegnanz.de/weblog/ein-tag-vier-interessante-news] -- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
Hello Christian, Am Mittwoch, 4. Dezember 2013, 02:07:45 schrieb Christian Boltz:
Hello,
Am Dienstag, 3. Dezember 2013 schrieb Wolfgang Hahnl:
Am Montag, 2. Dezember 2013, 21:52:51 schrieb Christian Boltz:
- if you agree with the boost value for each namespace
The number system you mentioned above is a little to cryptical for me. "Say" what is first, second and so on in the result list.
Well, it depends on how good the page matches the search term ;-)
Think of the boost value as a "weight" (not as a "first", "second", ...) that is part of the ranking calculation. The other part in the calculation is how good the page content matches the search term.
Let me give you an example:
Search for "download openSUSE distribution" [1]
Let's assume there are two pages with a 100% text match: - /Download - /Archive:Download_12.2
Both search results are equal, now the boost value comes into play, and you'll get the following search result order: - /Download: 100% match * 1 = 100 "points" - /Release_Notes: 70% match * 1 = 70 "points" - openSUSE:Download_statistics: 80% match * 0.6 = 48 "points" - /Archive:Download_12.2: 100% match * 0.2 = 20 "points"
So in the end /Download is the first search result (100 "points"), followed by some other pages (which contain only two of the three words you searched) and, much more downwards, /Archive:Download_12.2
The result can also be the other way round - a 100% match in the SDB namespace (= 60 "points") can be the top result if there's only a 50% match in the main namespace (= 50 "points").
Nevertheless, it will be quite rare that a page in the Talk namespace ends up as first search result because it can get a maximum of 100% match * 0.0005 = 0.05 "points".
Does this explain the boost values?
Regards,
Christian Boltz
[1] I just invented this search and its results, which means the "real" search will most probably give you different results and/or different page content.
-- thank you for this explanation. Best Regards Wolfgang openSUSE Member DE-Wiki-Team -- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
On Mon, 02 Dec 2013 21:52:51 +0100 Christian Boltz <opensuse@cboltz.de> wrote:
Hello,
I'm starting a new thread to make sure everybody who is interested notices it ;-)
You all know that we get quite some complaints (and/or complain ourself) about our wiki search because "it doesn't find the stuff".
This is mostly caused by the fact that by default the search only lists pages in the main and Portal namespace.
I propose to change the wiki search and the config of Lucene (the search engine used in the openSUSE wikis) to - search all (relevant) namespaces - use [Namespace-Boost] in lsearch-global.conf to boost the namespaces that we currently search (main and portal)
The result will be that pages from main and portal namespace will still be the top-rated search results, with the (currently hidden) results from other namespaces below.
Here's a list of the namespaces we use, together with the boost value I propose (1 means top rating, 0.0001 means it will be the last search result, - means not to include this namespace in the default search [1])
Boost Namespace number/name 1 0 Main 0.0005 1 Talk 0.005 2 User 0.001 3 User talk 0.6 4 Project ("openSUSE") 0.0005 5 Project talk 0.02 6 File 0.0005 7 File talk 0.005 8 MediaWiki - 9 MediaWiki talk 0.0005 10 Template - 11 Template talk 0.01 12 Help 0.0005 13 Help talk 0.02 14 Category 0.0005 15 Category talk 0.6 100 SDB 0.0005 101 SDB talk 1 102 Portal 0.0005 103 Portal talk 0.2 104 Archive 0.0005 105 Archive talk 0.2 106 HCL 0.0005 107 HCL talk - 110 Book - 111 Book talk - 122 Property - 123 Property talk - 126 Form - 127 Form talk - 128 Concept - 129 Concept talk - 420 Layer - 421 Layer talk
Just in case you wonder about the numbers - they are based on what Wikipedia uses AFAIK: (0, 1) (1, 0.0005) (2, 0.005) (3, 0.001) (4, 0.01), (6, 0.02), (8, 0.005), (10, 0.0005), (12, 0.01), (14, 0.02) and adjusted to our needs (see for example the difference in the project namespace)
Sorted by importance 1 0 Main 1 102 Portal 0.6 4 Project ("openSUSE") 0.6 100 SDB 0.2 104 Archive - archive seems to be too high 0.2 106 HCL 0.02 6 File 0.02 14 Category - category should be .3 0.01 12 Help 0.005 2 User 0.005 8 MediaWiki - nothing for casual user 0.001 3 User talk 0.0005 1 Talk 0.0005 5 Project talk 0.0005 7 File talk 0.0005 10 Template 0.0005 13 Help talk 0.0005 15 Category talk 0.0005 101 SDB talk 0.0005 103 Portal talk 0.0005 105 Archive talk 0.0005 107 HCL talk
I'd like to know - if you like this proposal in general,
Yes.
- if you agree with the list of namespaces to search by default,
Yes.
- if we want to include the talk pages in the default search and
No. There is nothing there for casual user (reader, visitor).
- if you agree with the boost value for each namespace
Yes; see comments.
In the meantime, I'll fetch some cable ties to tie up Henne - I'm sure he won't like the idea of including all those namespaces in the default search ;-))
Current distro release presentation should go to http://opensuse.org/distro/ and wiki should become what it is - a wiki.
Needless to say that we might need to finetune the boost values of some namespaces later (needs some testing with the real wiki content), but in general I think the above values will work much better than what we have now ;-)
I would just apply those to github/wiki and check how that works.
Regards,
Christian Boltz
[1] Nevertheless, we should set those namespaces to a boost value of 0.0001 in case someone includes them in the search
-- Regards, Rajko. -- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
Hey, On 02.12.2013 21:52, Christian Boltz wrote:
In the meantime, I'll fetch some cable ties to tie up Henne - I'm sure he won't like the idea of including all those namespaces in the default search ;-))
If we weight them, I have no problem whatsoever with it :) Henne -- Henne Vogelsang http://www.opensuse.org Everybody has a plan, until they get hit. - Mike Tyson -- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
Hello, Am Mittwoch, 4. Dezember 2013 schrieb Henne Vogelsang:
On 02.12.2013 21:52, Christian Boltz wrote:
In the meantime, I'll fetch some cable ties to tie up Henne - I'm sure he won't like the idea of including all those namespaces in the default search ;-))
If we weight them, I have no problem whatsoever with it :)
OK, then I'll tie you up with exactly 42 g of cable ties ;-) *SCNR* Regards, Christian Boltz -- # 60 Sekunden warten sleep 180 [Ausschnitt aus einem Script von Martin Hofius in opensuse-de] -- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org
participants (4)
-
Christian Boltz
-
Henne Vogelsang
-
Rajko
-
Wolfgang Hahnl