[opensuse] Is there a program that gives byte value frequencies.
I need a trivial program that will read in a file and tell me how frequently each of the 255 btye values show up *whoops, I need to know about all 256 values" I know I could do something in awk, etc, but I have 5 TB of data to process (only 15 or so files. One is over 2TB, and none are small)/ No programs that use 32-bit ints to hold the count. If nothing exists, I guess I can write a c program relatively quickly. I'll give the brains here until morning (New York time) to suggest an efficient solution. Thanks all, Greg -- Greg Freemyer www.IntelligentAvatar.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Greg Freemyer wrote:
I need a trivial program that will read in a file and tell me how frequently each of the 255 btye values show up *whoops, I need to know about all 256 values"
I know I could do something in awk, etc, but I have 5 TB of data to process (only 15 or so files. One is over 2TB, and none are small)/
No programs that use 32-bit ints to hold the count.
If nothing exists, I guess I can write a c program relatively quickly. I'll give the brains here until morning (New York time) to suggest an efficient solution.
Greg, surely you could have written the code faster than asking the question? #include <stdio.h> unsigned long int count[256] = {0}; char buffer[1048576]; int main( int argc, char** argv ) { int f, i, len; while( fgets(buffer,1024,stdin) ) { f=open(buffer,"r"); while( len=read(f,buffer,sizeof(buffer)) ) { for ( i=0; i<len; i++ ) count[buffer[i]]++; } close(f); } for( i=0; i<256; i++ ) printf("%u = %lu\n", i, count[i]); Feed your filenames to stdin. -- Per Jessen, Zürich (6.8°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sun, Oct 18, 2015 at 4:20 AM, Per Jessen <per@computer.org> wrote:
Greg Freemyer wrote:
I need a trivial program that will read in a file and tell me how frequently each of the 255 btye values show up *whoops, I need to know about all 256 values"
I know I could do something in awk, etc, but I have 5 TB of data to process (only 15 or so files. One is over 2TB, and none are small)/
No programs that use 32-bit ints to hold the count.
If nothing exists, I guess I can write a c program relatively quickly. I'll give the brains here until morning (New York time) to suggest an efficient solution.
Greg, surely you could have written the code faster than asking the question?
#include <stdio.h>
unsigned long int count[256] = {0}; char buffer[1048576];
int main( int argc, char** argv ) { int f, i, len;
while( fgets(buffer,1024,stdin) ) { f=open(buffer,"r");
while( len=read(f,buffer,sizeof(buffer)) ) { for ( i=0; i<len; i++ ) count[buffer[i]]++; }
close(f); } for( i=0; i<256; i++ ) printf("%u = %lu\n", i, count[i]);
Feed your filenames to stdin.
Thanks Per. I just don't program that much anymore. I can read c code with no issue, but troubleshooting takes a more intimate familiarity with c than I have anymore. btw: there were 2 or 3 bugs in what you wrote and debugging that took me the better part of an hour. Just goes to show our rusty my skills are. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Greg Freemyer wrote:
btw: there were 2 or 3 bugs in what you wrote and debugging that took me the better part of an hour.
Yeah, it did go a little too fast. -- Per Jessen, Zürich (6.8°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (2)
-
Greg Freemyer
-
Per Jessen