[Bug 225618] New: UTF-8 performace issue
https://bugzilla.novell.com/show_bug.cgi?id=225618 Summary: UTF-8 performace issue Product: openSUSE 10.2 Version: RC 1 Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: koenig@linux.de QAContact: qa@suse.de while digging into my updatedb/ cron problem (see bug #225614) I run "updatedb" on the command line and noticed that sort took 5+ minutes CPU time to sort ~43MB output (800k lines) which IMHO is quite much. I used the first 200k file names of find output in updatedb to sort or just run "wc" and get a horrible picture about the performace with LC_CTYPE=de_DE.UTF-8 : sort wc de_DE.UTF-8 77.86 2.42 de_DE 1.88 0.29 I understand that updatedb from cron will run with LC_ALL=POSIX but users will use such tools like sort or wc and there performance sucks for german locale !! I have no idea which other applications and operations might be affected by this topic, but since it looks like some generic libc/locale issue, it might be a BIG problem... btw, this reminds me a bit to the still open emacs performace issue in bug #182294. here are the full sort/wc commands with output for reference: t3:~ # ( LC_CTYPE=de_DE.UTF-8 ; head -200000 /tmp/sort-in | time sort -f > /dev/null ) 77.86user 0.47system 1:19.22elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+3897minor)pagefaults 0swaps t3:~ # ( LC_CTYPE=de_DE ; head -200000 /tmp/sort-in | time sort -f > /dev/null ) 1.88user 0.05system 0:02.04elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+3877minor)pagefaults 0swaps t3:~ # ( LC_CTYPE=de_DE.UTF-8 ; head -200000 /tmp/sort-in | time wc ) 200000 200014 10063202 2.42user 0.04system 0:02.58elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+223minor)pagefaults 0swaps t3:~ # ( LC_CTYPE=de_DE ; head -200000 /tmp/sort-in | time wc ) 200000 200014 10063202 0.29user 0.01system 0:00.38elapsed 79%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=225618 ------- Comment #1 from koenig@linux.de 2006-12-02 12:06 MST ------- here are the real timing infos for updatedb with UTF-8 and POSIX LC_CTYPE: updatedb: POSIX 15.38user 40.87system 10:06.20elapsed 9%CPU de_DE.UTF-8 400.69user 43.36system 15:53.12elapsed 46%CPU sort in updatdb: POSIX real 10m5.769s user 0m8.205s sys 0m2.088s de_DE.UTF-8 real 15m52.292s user 6m32.965s sys 0m4.492s and now guess what users think if their apps behave like this... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com