Hi all, I'm trying to work with text files that have mixed hard hyphens (ascii decimal 45) and soft hyphens (ascii decimal 255) and I need something that will let me distinguish between the two. hp-ux has a utility called vis that does just what I need-type "vis <filename>" and it returns text with everything above 7 bit ascii encoded as \(decimal number). Is there an equivalent in SUSE? I've looked and can't find anything. Or any other suggestions? Thanks Dave Driscoll
On Wed, Jun 25, 2003 at 10:35:42AM -0700, David Driscoll wrote:
Hi all, I'm trying to work with text files that have mixed hard hyphens (ascii decimal 45) and soft hyphens (ascii decimal 255) and I need something that will let me distinguish between the two. hp-ux has a utility called vis that does just what I need-type "vis <filename>" and it returns text with everything above 7 bit ascii encoded as \(decimal number). Is there an equivalent in SUSE? I've looked and can't find anything. Or any other suggestions?
I'm not entirely sure of myself here... But if I've understood your question correctly, then this might be somewhere along what you're looking for. (Disclaimer; "No, I don't pretend to know what I'm doing!" ;) #!/usr/bin/perl open FILE, "<$ARGV[0]" or die "File does not exist\n"; @FILE=(<FILE>); foreach $line (@FILE) { @uni = unpack("C*", $line); while ($char = shift(@uni)) { if ($char <= "127") { push @back, (pack("C*", $char)); } else { push @back, ("\\$char"); } } } print @back; print "\nÆØÅ\n"; When I run it on itself, the last line of the script comes out as; print "\n\198\216\197\n"; Is this useful? HTH Jon Clausen -- If we can't be free, at least we can be cheap!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 25 June 2003 03:32 pm, Jon Clausen wrote:
#!/usr/bin/perl open FILE, "<$ARGV[0]" or die "File does not exist\n"; @FILE=(<FILE>); foreach $line (@FILE) { @uni = unpack("C*", $line); while ($char = shift(@uni)) { if ($char <= "127") { push @back, (pack("C*", $char)); } else { push @back, ("\\$char"); } } } print @back; print "\nÆØÅ\n";
Or with a regex: #!/usr/bin/perl open FILE, "<$ARGV[0]" or die "File does not exist\n"; @FILE=(<FILE>); foreach $line (@FILE) { $_ = $line; s/([\x7F-\xFF])/'\\' . ord($1)/gse; print $_; } print @back; print "\nÆØÅ\n"; - -- James Oakley Engineering - SolutionInc Ltd. joakley@solutioninc.com http://www.solutioninc.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) iD8DBQE++fap+FOexA3koIgRAk8fAKC2Cfhz1CCKGge44w8tNYr6Hq2pXQCfb2Sh YQwzE2iOpC+ok71RKwe6oIQ= =Bkd8 -----END PGP SIGNATURE-----
On Wed, 25 Jun 2003 16:23:21 -0300
James Oakley
Or with a regex:
And here is another handy file dumper which is kind-of cool. It prints each line of the file, followed by 2 lines with the hex value of the character, aligned vertically under each character. Once you get used to it, it's very easy to read. #!/usr/bin/perl -wnl012 # Prints the contents of a file a line at a time # followed by the ASCII value of each character in vertical columns. # Useful for debugging. # If no filename is specified then input is read from the keyboard. print; # Print the line we've just read @hexvals = map {sprintf "%02X", ord $_} split //; # Get hex value of each char for $a (0, 1) {print map {substr $_, $a, 1} @hexvals} # Print the hex values. #print "\n"; -- use Perl; #powerful programmable prestidigitation
On Wed, Jun 25, 2003 at 03:49:57PM -0400, zentara wrote:
And here is another handy file dumper which is kind-of cool. It prints each line of the file, followed by 2 lines with the hex value of the character, aligned vertically under each character.
This is one of the stranger things that I've seen ;)
Once you get used to it, it's very easy to read.
I guess it does take some 'getting used to', at least before I could use it on larger files... :-P Pretty neat though Tks, /Jon -- If we can't be free, at least we can be cheap!
On Wed, Jun 25, 2003 at 04:23:21PM -0300, James Oakley wrote: <snip> Kind of suspected (sort of hoped) someone else would respond with cleaner suggestions.
Or with a regex:
#!/usr/bin/perl open FILE, "<$ARGV[0]" or die "File does not exist\n"; @FILE=(<FILE>); foreach $line (@FILE) { $_ = $line; s/([\x7F-\xFF])/'\\' . ord($1)/gse;
*Very* nice ;)
print $_;
I'd stuff it *into* @back with; push @back, $_; (in case I wanted to do something other than print it to stdout)
} print @back;
Otherwise the above seems kind of... well... unneccessary... ;) /Jon -- If we can't be free, at least we can be cheap!
Hi have you tried:- od -c <filename> Laurence At 20:32 25/06/03 +0200 you scribbled:
On Wed, Jun 25, 2003 at 10:35:42AM -0700, David Driscoll wrote:
Hi all, I'm trying to work with text files that have mixed hard hyphens (ascii decimal 45) and soft hyphens (ascii decimal 255) and I need something that will let me distinguish between the two. hp-ux has a utility called vis that does just what I need-type "vis <filename>" and it returns text with everything above 7 bit ascii encoded as \(decimal number). Is there an equivalent in SUSE? I've looked and can't find anything. Or any other suggestions?
I'm not entirely sure of myself here... But if I've understood your question correctly, then this might be somewhere along what you're looking for.
(Disclaimer; "No, I don't pretend to know what I'm doing!" ;)
#!/usr/bin/perl open FILE, "<$ARGV[0]" or die "File does not exist\n"; @FILE=(<FILE>); foreach $line (@FILE) { @uni = unpack("C*", $line); while ($char = shift(@uni)) { if ($char <= "127") { push @back, (pack("C*", $char)); } else { push @back, ("\\$char"); } } } print @back; print "\nÆØÅ\n";
When I run it on itself, the last line of the script comes out as;
print "\n\198\216\197\n";
Is this useful?
HTH Jon Clausen -- If we can't be free, at least we can be cheap!
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
Laurence ** if you want to know what the program really does, look at the code ** laurence@orchards.org.uk This Mail should NOT have an attachment, if it does it may have been created by a VIRUS DO NOT OPEN IT!!!
David Driscoll
I'm trying to work with text files that have mixed hard hyphens (ascii decimal 45) and soft hyphens (ascii decimal 255) and I need something that will let me distinguish between the two.
Some text editors can do it. For instance M-x load-library <Enter> iso-ascii <Enter> does it in Emacs. The original text: hard-hyphen and softhypen then looks hard-hyphen and soft{-}hypen -- Alexandr.Malusek@imv.liu.se
participants (6)
-
Alexandr Malusek
-
David Driscoll
-
James Oakley
-
Jon Clausen
-
Laurence Orchard
-
zentara