Hello, On Fri, 03 Apr 2015, Linda Walsh wrote:
David Haller wrote:
==== #!/bin/bash HFILE="$1" TMPF="$(mktemp "${HFILE}.$$.XXXXXXXXXX")" if time tac "$HFILE" | awk '!x[$0]++ { print; }' > "${TMPF}"; then tac "${TMPF}" > "${HFILE}" && rm "${TMPF}" fi ==== I'm not 100% sure, but the above looks like it might not filter out your dups as it doesn't sort them.
It reads the file from the tail and dedups by using a hash, storing the first occurence (i.e. the latest!) and ignores duplicate same lines later (i.e. earlier). That '!x[$0]++' could be written as: current_line = $0; if( ! seen_this_line[current_line] ) { seen_this_line[current_line] = 1; print current_line; } No need to sort anything ;) Think of this history: cd foo ls cd ls tac reverses that, so awk reads: ls, cd, ls, cd foo first line read: x["ls"] is not there, so false, so x["ls"] is assigned and gets x["ls"] == 1, a true value. Next ! x["cd"] is again true as x["cd"] is non-existent i.e. false, but gets assigned 1 too. Next up "ls" again. Now, x["ls"] is 1, i.e. true, it gets incremented to 2 (you could use that for some stats at the end ;) and the next line is read, as ! x["ls"] is false. Next up: ! x["cd foo"] is true, as we've not seen that before. At the end, the whole shebang is reversed again, and "tada", we get cd foo cd ls
I.e. if you run unsorted input into 'uniq', you won't get 'uniq' output. Also not sure why the time command is there? Are you trying to time the tac command?
No, the whole thing, just out of curiosity. On my ~225k lines it took about 4.5s.
I ran your script on a concatenation of my 'tty-hist files' and first few lines of output from your script were: [..] #1328153686 ls repo-oss #1328153702 ls repo-oss/suse/x86_64/ #1328153720 rpm -qpi repo-oss/suse/x86_64/xfce4-panel-plugin-xkb-0.5.3.3-7.1.x86_64.rpm
Oh, you use timestamps. It won't work then. If you grab a test file and filter out the timestamps, e.g.: grep '^#' historyfile > tmp.hist and run my script on that tmp.hist, you'll see it works ;)
1) why would your script keep 'uniq' blank lines? I.e. the consecutive timestamps w/no command & 2, the output from your script shows its 1st command starting
They're not blank.
Besides blank lines, I try to filter out bogus or trival commands, and unlike bash -- if my script finds a dup -- it updates the 'seconds' field for that command so the time field shows the last time I used that command. That way my more frequently used commands end up near the end of the list...
Your script is *way* shorter than mine though: (not including library calls! *sigh*)...
wc bin/hist_dedup 137 487 3707 bin/hist_dedup
Have a try with this: ==== #!/bin/bash HFILE="$1" TMPF="$(mktemp "${HFILE}.$$.XXXXXXXX")" mkuniq() { gawk ' /^#[0-9]+/ { next; } # skip timestamps /^(ls|cd)$/ { # add more stuff to "filter" inside the (), note that I've # anchored these. Add more after the $, e.g.: # cd)$|^foo bar. Or make a whole block like this. next; } ! cmds[$0]++ { print; # print current cmd in $0 getline; # get timestamp (after cmd, file is reversed) print; # print timestamp (after cmd, will be reversed) }' } if tac "${HFILE}" | mkuniq > "$TMPF"; then tac "${TMPF}" > "${HFILE}" && rm "${TMPF}" fi ==== You should crosscheck results with your script etc. Any questions? HTH, -dnh -- Just go ahead and write your own multitasking multiuser os! Worked for me all the times. -- Linus Torvalds -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org