Re: [opensuse] Unicode v ascii?

19 Feb 2008


      Greg Freemyer wrote:
...
All,
I have a huge text file (1.7 million lines) full of unicode and ascii
text (half and half).
note: disk space for copies is not a problem, if I need to manipulate this file
Also, I have a 30 line file full of ascii text.
I need to search the large file for any occurrences of the keywords in
the 30 line file.
Ignoring the unicode issue, I could use grep (of fgrep, egrep) with
appropriate args.
I have no idea how to handle the unicode issue.
ASCII is a subset of UTF-8, so if your Unicode coding is UTF-8 there 
should be no problem. In other case I wonder how did you manage to 
create file with interspersed single and (fixed size) multibyte 
encodings... I guess you would have to split the file somehow.

Best regards
	Petr
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org

Re: [opensuse] Unicode v ascii?

Petr Cerny