[opensuse] AWK -- Strange chars used as separators in a CSV like file
All, I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and . I need to strip off the last column and discard it, so I thought I could use awk to do it something like. export FS='\024' awk '{print $1,$2,$3}' my_file > my_output It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong? I've also tried FS='\\024' FS='\0376\024\0376' FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields. Thanks Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer pecked at the keyboard and wrote:
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Thanks Greg
In vi move to the column in the line where you need to delete and type 999 then x and it will remove 999 characters (or what ever amount is remaining in the line) from the cursor going to the right. Move down to the next line position the cursor and hit the . (period character) to repeat the last op which should delete 999 characters in the line. -- Ken Schneider SuSe since Version 5.2, June 1998 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, May 28, 2008 at 7:09 PM, Ken Schneider <suse-list3@bout-tyme.net> wrote:
Greg Freemyer pecked at the keyboard and wrote:
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Thanks Greg
In vi move to the column in the line where you need to delete and type 999 then x and it will remove 999 characters (or what ever amount is remaining in the line) from the cursor going to the right. Move down to the next line position the cursor and hit the . (period character) to repeat the last op which should delete 999 characters in the line.
-- Ken Schneider Ken,
I'm pretty good in vi, but editing this mess is beyond me. Some of the "lines" have 100s of K of chars. Anyway, I'm making progress. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, May 28, 2008 at 6:54 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Never mind. I thought FS was supposed to be set at the shell level. It had to be set inside the awk script. So: awk ' BEGIN { FS="\024" } { print $1,$2,$3 } }' my_file > my_output seems to be working. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 28 May 2008 16:15, Greg Freemyer wrote:
On Wed, May 28, 2008 at 6:54 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
All,
...
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
...
Never mind.
I thought FS was supposed to be set at the shell level.
It had to be set inside the awk script.
Not necessarily. There's a command-line option to control the field separator: -F fs --field-separator fs Use fs for the input field separator (the value of the FS predefined variable).
...
Greg
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, May 28, 2008 at 7:31 PM, Randall R Schulz <rschulz@sonic.net> wrote:
On Wednesday 28 May 2008 16:15, Greg Freemyer wrote:
On Wed, May 28, 2008 at 6:54 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
All,
...
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
...
Never mind.
I thought FS was supposed to be set at the shell level.
It had to be set inside the awk script.
Not necessarily. There's a command-line option to control the field separator:
-F fs --field-separator fs Use fs for the input field separator (the value of the FS predefined variable).
That was actually the first thing I tried. Could not get it to work. Then I read online somewhere that it was only a supported arg for legacy reasons and that the preferred usage was the FS variable. I have it working now, and with gsub(/\376/,"\"") I should have those strange quotes gone as well. (I've got the disk back connected to a windows box, so I can't test that right now. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 28 May 2008 16:48, Greg Freemyer wrote:
On Wed, May 28, 2008 at 7:31 PM, Randall R Schulz <rschulz@sonic.net> wrote:
...
Never mind.
I thought FS was supposed to be set at the shell level.
It had to be set inside the awk script.
Not necessarily. There's a command-line option to control the field separator:
-F fs --field-separator fs Use fs for the input field separator (the value of the FS predefined variable).
That was actually the first thing I tried. Could not get it to work. Then I read online somewhere that it was only a supported arg for legacy reasons and that the preferred usage was the FS variable.
Well, I don't know why the man page would be inaccurate (it doesn't refer one to the info pages for complete documentation), but I do know of a BASH-ism that I use frequently, the $'string' format: -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==- Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Back-slash escape sequences, if present, are decoded as follows: \a alert (bell) \b backspace \e an escape character \f form feed \n new line \r carriage return \t horizontal tab \v vertical tab \\ backslash \' single quote \nnn the eight-bit character whose value is the octal value nnn (one to three digits) \xHH the eight-bit character whose value is the hexadecimal value HH (one or two hex digits) \cx a control-x character -==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
I have it working now, ...
Well, you can't argue with success...
Greg
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 28 May 2008 18:54:09 -0400, Greg Freemyer wrote:
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
Try: awk 'BEGIN {FS='\024'} {print $1,$2,$3}' my_file > my_output Jim -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (4)
-
Greg Freemyer
-
Jim Henderson
-
Ken Schneider
-
Randall R Schulz