Re: [opensuse] Re: 13.1 Bash HISTCONTROL=erasedups -- not working? confirm?

5 Apr 2015

      Hello,

On Fri, 03 Apr 2015, Linda Walsh wrote:
...
David Haller wrote:
...
====
#!/bin/bash
HFILE="$1"
TMPF="$(mktemp "${HFILE}.$$.XXXXXXXXXX")"
if time tac "$HFILE" | awk '!x[$0]++ { print; }' > "${TMPF}"; then
   tac "${TMPF}" > "${HFILE}" && rm "${TMPF}"
fi
====
I'm not 100% sure, but the above looks like it might not filter
out your dups as it doesn't sort them.
It reads the file from the tail and dedups by using a hash, storing
the first occurence (i.e. the latest!) and ignores duplicate same
lines later (i.e. earlier).

That '!x[$0]++' could be written as:

    current_line = $0;
    if( ! seen_this_line[current_line] ) {
        seen_this_line[current_line] = 1;
        print current_line;
    }

No need to sort anything ;)

Think of this history:

cd foo
ls
cd
ls

tac reverses that, so awk reads:

ls, cd, ls, cd foo

first line read: x["ls"] is not there, so false, so x["ls"] is
assigned and gets x["ls"] == 1, a true value. Next ! x["cd"] is
again true as x["cd"] is non-existent i.e. false, but gets assigned 1
too. Next up "ls" again. Now, x["ls"] is 1, i.e. true, it gets
incremented to 2 (you could use that for some stats at the end ;)
and the next line is read, as ! x["ls"] is false. Next up:
! x["cd foo"] is true, as we've not seen that before. At the end, the
whole shebang is reversed again, and "tada", we get

cd foo
cd
ls
...
I.e. if you run unsorted input into 'uniq', you won't get 'uniq' output.
Also not sure why the time command is there?  Are you trying
to time the tac command?
No, the whole thing, just out of curiosity. On my ~225k lines it took
about 4.5s.
...
I ran your script on a concatenation of my 'tty-hist files' and first
few lines of output from your script were:
[..]
#1328153686
ls repo-oss
#1328153702
ls repo-oss/suse/x86_64/
#1328153720
rpm -qpi repo-oss/suse/x86_64/xfce4-panel-plugin-xkb-0.5.3.3-7.1.x86_64.rpm
Oh, you use timestamps. It won't work then. If you grab a test file
and filter out the timestamps, e.g.:

    grep '^#' historyfile > tmp.hist

and run my script on that tmp.hist, you'll see it works ;)
...
1) why would your script keep 'uniq' blank lines?  I.e. the consecutive
timestamps w/no command & 2, the output from your script shows its 1st
command starting
They're not blank.
...
Besides blank lines, I try to filter out bogus or trival commands,
and unlike bash -- if my script finds a dup -- it updates
the 'seconds' field for that command so the time field
shows the last time I used that command.  That way my more frequently
used commands end up near the end of the list...
Your script is *way* shorter than mine though:
(not including library calls! *sigh*)...
...
wc bin/hist_dedup
137  487 3707 bin/hist_dedup
Have a try with this:

====
#!/bin/bash
HFILE="$1"
TMPF="$(mktemp "${HFILE}.$$.XXXXXXXX")"

mkuniq() {
    gawk '
    /^#[0-9]+/ { next; } # skip timestamps
    /^(ls|cd)$/ {
        # add more stuff to "filter" inside the (), note that I've
        # anchored these. Add more after the $, e.g.:
        # cd)$|^foo bar. Or make a whole block like this.
        next;
    }
    ! cmds[$0]++ {
        print;    # print current cmd in $0
        getline;  # get timestamp (after cmd, file is reversed)
        print;    # print timestamp (after cmd, will be reversed)
    }'
}

if tac "${HFILE}" | mkuniq > "$TMPF"; then
    tac "${TMPF}" > "${HFILE}" && rm "${TMPF}"
fi
====

You should crosscheck results with your script etc.
Any questions?

HTH,
-dnh

-- 
Just go ahead and write your own multitasking multiuser os!
Worked for me all the times.	-- Linus Torvalds
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org