Re: [opensuse] AWK -- Strange chars used as separators in a CSV like file
----- Original Message ---- From: Greg Freemyer <greg.freemyer@gmail.com> To: opensuse <opensuse@opensuse.org> Sent: Wednesday, May 28, 2008 4:54:09 PM Subject: [opensuse] AWK -- Strange chars used as separators in a CSV like file All, I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and . I need to strip off the last column and discard it, so I thought I could use awk to do it something like. export FS='\024' awk '{print $1,$2,$3}' my_file > my_output It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong? I've also tried FS='\\024' FS='\0376\024\0376' FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields. Thanks Greg ----------------------------------------------------------------------------------------------------------------- You can do this easily enough with vi (or vim, or whatever clone) You will use the substitute command. I'm not exactly sure what you're trying to change so I'll offer several suggestions: start vi with: $ vi myfile.dat then inside vi type :%s/^v^t.*$// (colon, percent, lowercase-ess, forward-slash, control-v, control-t, period, asterisk, dollar, forward-slash, forward-slash) will delete everything from the last control-t to the end of the line, on every line in the file If I didn't quite understand, please feel free to ask again explaining what I misunderstood and I'll see what I can do. "You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." — Naguib Mahfouz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, May 28, 2008 at 7:19 PM, Simon Roberts <thorpflyer@yahoo.com> wrote:
----- Original Message ----
From: Greg Freemyer <greg.freemyer@gmail.com> To: opensuse <opensuse@opensuse.org> Sent: Wednesday, May 28, 2008 4:54:09 PM Subject: [opensuse] AWK -- Strange chars used as separators in a CSV like file
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Thanks Greg -----------------------------------------------------------------------------------------------------------------
You can do this easily enough with vi (or vim, or whatever clone)
You will use the substitute command. I'm not exactly sure what you're trying to change so I'll offer several suggestions:
start vi with:
$ vi myfile.dat
then inside vi type
:%s/^v^t.*$// (colon, percent, lowercase-ess, forward-slash, control-v, control-t, period, asterisk, dollar, forward-slash, forward-slash)
will delete everything from the last control-t to the end of the line, on every line in the file
If I didn't quite understand, please feel free to ask again explaining what I misunderstood and I'll see what I can do.
Thanks, I had actually tried something similar, but I left out the period. I forgot it was a real RE, so I just put in a *. Thanks for the reminder about that. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, May 28, 2008 at 7:19 PM, Simon Roberts <thorpflyer@yahoo.com> wrote:
----- Original Message ----
From: Greg Freemyer <greg.freemyer@gmail.com> To: opensuse <opensuse@opensuse.org> Sent: Wednesday, May 28, 2008 4:54:09 PM Subject: [opensuse] AWK -- Strange chars used as separators in a CSV like file
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Thanks Greg -----------------------------------------------------------------------------------------------------------------
You can do this easily enough with vi (or vim, or whatever clone)
You will use the substitute command. I'm not exactly sure what you're trying to change so I'll offer several suggestions:
start vi with:
$ vi myfile.dat
then inside vi type
:%s/^v^t.*$// (colon, percent, lowercase-ess, forward-slash, control-v, control-t, period, asterisk, dollar, forward-slash, forward-slash)
will delete everything from the last control-t to the end of the line, on every line in the file
If I didn't quite understand, please feel free to ask again explaining what I misunderstood and I'll see what I can do.
If anyone cares: Out of curiosity, I tried the above. It does not work because it finds the first occurrence of ^T on a line, not the last like I needed. I may need to do this more in the future, so a vi command to do this would be useful. Thanks Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer wrote:
On Wed, May 28, 2008 at 7:19 PM, Simon Roberts <thorpflyer@yahoo.com> wrote:
----- Original Message ----
From: Greg Freemyer <greg.freemyer@gmail.com> To: opensuse <opensuse@opensuse.org> Sent: Wednesday, May 28, 2008 4:54:09 PM Subject: [opensuse] AWK -- Strange chars used as separators in a CSV like file
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Thanks Greg -----------------------------------------------------------------------------------------------------------------
You can do this easily enough with vi (or vim, or whatever clone)
You will use the substitute command. I'm not exactly sure what you're trying to change so I'll offer several suggestions:
start vi with:
$ vi myfile.dat
then inside vi type
:%s/^v^t.*$// (colon, percent, lowercase-ess, forward-slash, control-v, control-t, period, asterisk, dollar, forward-slash, forward-slash)
will delete everything from the last control-t to the end of the line, on every line in the file
If I didn't quite understand, please feel free to ask again explaining what I misunderstood and I'll see what I can do.
If anyone cares:
Out of curiosity, I tried the above. It does not work because it finds the first occurrence of ^T on a line, not the last like I needed.
I may need to do this more in the future, so a vi command to do this would be useful.
Thanks Greg
In all of this, and your effort at hand editing, I'm surprised you never used the tr command to do something like this: tr \t , < bigdatabasefile. to create a comma-separated value file -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, May 29, 2008 at 11:40 PM, Evens Garde <evans.garde@gmail.com> wrote:
Greg Freemyer wrote:
On Wed, May 28, 2008 at 7:19 PM, Simon Roberts <thorpflyer@yahoo.com> wrote:
----- Original Message ----
From: Greg Freemyer <greg.freemyer@gmail.com> To: opensuse <opensuse@opensuse.org> Sent: Wednesday, May 28, 2008 4:54:09 PM Subject: [opensuse] AWK -- Strange chars used as separators in a CSV like file
All,
I have a CSV like file that uses the octal single byte char 024 (or cntrl-T) as a comma and 0376 as a quote char and .
I need to strip off the last column and discard it, so I thought I could use awk to do it something like.
export FS='\024' awk '{print $1,$2,$3}' my_file > my_output
It seems to still be useing a space (' ') as the field separator. Any idea what I'm doing wrong?
I've also tried FS='\\024' FS='\0376\024\0376'
FYI: This may be a one time need and it is only about 500 lines, so using vi to manually do it is acceptable, but the text within the quotes can be very long, so it is hard to work on visually. The good news is that cntrl-T (\024) should never appear within any of the actual fields.
Thanks Greg
-----------------------------------------------------------------------------------------------------------------
You can do this easily enough with vi (or vim, or whatever clone)
You will use the substitute command. I'm not exactly sure what you're trying to change so I'll offer several suggestions:
start vi with:
$ vi myfile.dat
then inside vi type
:%s/^v^t.*$// (colon, percent, lowercase-ess, forward-slash, control-v, control-t, period, asterisk, dollar, forward-slash, forward-slash)
will delete everything from the last control-t to the end of the line, on every line in the file
If I didn't quite understand, please feel free to ask again explaining what I misunderstood and I'll see what I can do.
If anyone cares:
Out of curiosity, I tried the above. It does not work because it finds the first occurrence of ^T on a line, not the last like I needed.
I may need to do this more in the future, so a vi command to do this would be useful.
Thanks Greg
In all of this, and your effort at hand editing, I'm surprised you never used the tr command to do something like this:
tr \t , < bigdatabasefile.
to create a comma-separated value file
My data set has lots of standard commas in it. so I eventually do just that, but it is the last step of: kill last column which can be huge and contain any printable char plus a couple of specific non printable ones. verify no quote chars in whats left convert \276 to quote convert \024 to comma Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (3)
-
Evens Garde
-
Greg Freemyer
-
Simon Roberts