[opensuse-gnome] [Fwd: Re: beagle bits ...]
So, Morten said I could forward this. I think it's a highly useful data-point for hacker adoption. I imagine indexing .c files is a waste of time with beagle, it's for documents - and worse huge, deep source code directory hierarchies - just loading all those dentries & inodes blows your cache and makes ~everything else apparently 'slow'. Until we get a better underlying fs this will always be a problem I think [ btrfs ]. eg. my $ time ls -Rl ~/kde/sources - took: real 2m11.770s user 0m0.928s sys 0m3.824s I guess we need a feature to prune all sub-directories of things that look like top-level source-code / project directories. Should prolly get that into FATE. HTH, Michael. -- michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot
On Fri, 2009-01-09 at 14:08 +0000, Michael Meeks wrote:
So, Morten said I could forward this.
I think it's a highly useful data-point for hacker adoption.
I imagine indexing .c files is a waste of time with beagle, it's for documents - and worse huge, deep source code directory hierarchies - just loading all those dentries & inodes blows your cache and makes ~everything else apparently 'slow'. Until we get a better underlying fs this will always be a problem I think [ btrfs ].
eg. my $ time ls -Rl ~/kde/sources - took:
real 2m11.770s user 0m0.928s sys 0m3.824s
I guess we need a feature to prune all sub-directories of things that look like top-level source-code / project directories. Should prolly get that into FATE.
It would be nice to index HACKING, README, possibly documentation still. Just in case I'm looking for a project maintainer or instructions on how to build foo. Then again, maybe I'm looking for someone's e-mail address and the only place it can be found is in the copyright notice of some .c file :) While I don't think it's a bad idea to short-circuit source code toplevels, I'm worried that a) it won't be enough because there may be other big collections of files that shouldn't be indexed out there and we'll end up with lots of fairly arbitrary rules for what not to index and b) I'm not convinced that people don't search source code for mentions of <foo>. I don't, but then, I don't use Beagle at all. Another, more sophisticated strategy might be to build a list of things to index and prioritize them, so that source directories come last and are indexed at a fairly leisurely pace. It's definitely more work, though. -- Hans Petter -- To unsubscribe, e-mail: opensuse-gnome+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-gnome+help@opensuse.org
Hi there, On Mon, 2009-01-12 at 21:42 +0100, Hans Petter Jansson wrote:
It would be nice to index HACKING, README, possibly documentation still.
I guess; makes it more complicated but ... ;-) I'm fairly convinced beagle is not a tool for hackers to grok their source trees with. People use 'git grep' for this, or they use a sane semantically informed search - ctags or whatever. The sheer cost of diving down all those directories is staggering.
While I don't think it's a bad idea to short-circuit source code toplevels, I'm worried that a) it won't be enough
No doubt - but it will at least help a lot for this use case :-) and hopefully stop hackers griping about it.
b) I'm not convinced that people don't search source code for mentions of <foo>. I don't, but then, I don't use Beagle at all.
Oh - sure they do, but the vast cost of indexing (eg.) all of the OO.o source code, on the off-chance that someone wants to search for something is too staggeringly large IMHO - there is a far higher risk that people just turn beagle off - and thus get nothing indexed.
Another, more sophisticated strategy might be to build a list of things to index and prioritize them, so that source directories come last and are indexed at a fairly leisurely pace. It's definitely more work, though.
Yep - that might work too - though; really - I'm fairly certain that source code directories are highly dynamic, take a lot of maintenance, contain -tons- of stuff which (no doubt) balloons the index size, and are not interesting in search results :-) HTH, Michael. -- michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot -- To unsubscribe, e-mail: opensuse-gnome+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-gnome+help@opensuse.org
On Tue, 2009-01-13 at 10:29 +0000, Michael Meeks wrote:
On Mon, 2009-01-12 at 21:42 +0100, Hans Petter Jansson wrote:
It would be nice to index HACKING, README, possibly documentation still.
I guess; makes it more complicated but ... ;-) I'm fairly convinced beagle is not a tool for hackers to grok their source trees with. People use 'git grep' for this, or they use a sane semantically informed search - ctags or whatever. The sheer cost of diving down all those directories is staggering.
Agreed. But sometimes I drop e.g. relevant PDFs into a source-controlled directory (say, under doc/ or reference/) just so I won't lose them, and I suspect other people have their own odd habits that make this a dangerous road to travel down :) So: Cutting out source dirs entirely in the short term is a good idea - one you've convinced me we should pursue - but we should also keep an eye towards replacing it with a more sophisticated solution long-term. Another reason I think people are dropping Beagle is that it just isn't useful enough (cost may be high, but benefit is also low?). The last time I tested it, which is admittedly a long time ago, I felt like it just didn't rank my documents very well; ranking is much harder than finding...
Another, more sophisticated strategy might be to build a list of things to index and prioritize them, so that source directories come last and are indexed at a fairly leisurely pace. It's definitely more work, though.
Yep - that might work too - though; really - I'm fairly certain that source code directories are highly dynamic, take a lot of maintenance, contain -tons- of stuff which (no doubt) balloons the index size, and are not interesting in search results :-)
We could index comments only (which would give us license, authors, the odd expletive, maybe even some insights), or exclude source code altogether - but still crawl the directories for interesting stuff (like PNGs and JPGs in source control, the PS and PDF files I mentioned above, uninstalled documentation, etc). -- Hans Petter -- To unsubscribe, e-mail: opensuse-gnome+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-gnome+help@opensuse.org
participants (2)
-
Hans Petter Jansson
-
Michael Meeks