[opensuse] C howto create set of pointers to lines of text in buffer?
Listmates, Slightly OT, but if I read a text file into a single buffer, I can parse the buffer to find the number of lines by finding and counting the '\n' characters, but then how do I create a set of pointers (array, list, whatever) to the lines of text so I can access them? For example, I take ~/.bashrc and then its file size fread to read the entire file into a buffer before I work on the text. I want a way to get a pointer to the start of each line so I can then manipulate the lines as if I had read them into individual strings. I can use fmemopen, use the address of the buffer and then get the address of each '\n' and then add 1 to the pointer to get the start of the next line of text, but that just seems whacky. Anybody have any thoughts on how to make this work? -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
In <200908202214.46288.drankinatty@suddenlinkmail.com>, David C. Rankin wrote:
Slightly OT, but if I read a text file into a single buffer, I can parse the buffer to find the number of lines by finding and counting the '\n' characters, but then how do I create a set of pointers (array, list, whatever) to the lines of text so I can access them?
I can use fmemopen, use the address of the buffer and then get the address of each '\n' and then add 1 to the pointer to get the start of the next line of text, but that just seems whacky.
Why does that seem whacky? There's no special BOL or EOL marker (other than '\n') in the file, so the only way to identify the lines is to find the '\n's. You could use different tools, but strchr or similar seems sufficient. You might want to build your "array, list, whatever" while you are counting the lines though. -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/
On Friday 21 August 2009 12:07:49 am Boyd Stephen Smith Jr. wrote:
You might want to build your "array, list, whatever" while you are counting the lines though.
I thought about that, but I was going to dynamically allocate the pointer array so I thought I would need to scan the buffer first to know how many pointers to allocate? (I haven't got there yet, so maybe that is the whacky part of my thinking ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
In <200908210055.17041.drankinatty@suddenlinkmail.com>, David C. Rankin wrote:
On Friday 21 August 2009 12:07:49 am Boyd Stephen Smith Jr. wrote:
You might want to build your "array, list, whatever" while you are counting the lines though.
I thought about that, but I was going to dynamically allocate the pointer array so I thought I would need to scan the buffer first to know how many pointers to allocate? (I haven't got there yet, so maybe that is the whacky part of my thinking ;-)
realloc() is your friend if you need to grow an array. Example code: int find_lines_array( /* IN */ char *wholeFile, /* IN */ size_t fileSize, /* OUT */ char *(*lines[]), /* OUT */ size_t *lineCount ) { size_t l_count = 0; char *(*l_lines[]) = NULL; int begOfLine = 1 for (size_t offset = 0; offset < fileSize; ++offset) { if (begOfLine) { if (l_count == SIZE_MAX) { free(l_lines); return EOVERFLOW; } ++l_count; if (SIZE_MAX / l_count < sizeof((*l_lines)[0])) { free(l_lines); return EOVERFLOW; } l_lines = realloc(l_lines, l_count * sizeof((*l_lines)[0]); if (!l_lines) { /* XXX: Memory Leak */ return ENOMEM; } l_lines[l_count - 1] = wholeFile + offset; begOfLine = 0; } if (wholeFile[offset] = '\n') { begOfLine = 1; } } *lineCount = l_count; *lines = *l_lines; /* FIXME: May not work */ return 0; } Using a linked-list (singly- or doubly-linked) won't even need a realloc. Example code: typedef struct node_t { struct node_t *next; char *line; } node; int find_lines_list( /* IN */ char *wholeFile, /* IN */ size_t fileSize, /* OUT */ node *linesHead, /* OUT */ size_t *linesCount ) { size_t l_count; node *l_head = NULL; node *tail = NULL; int begOfLine = 1; for (size_t offset = 0; offset < fileSize; ++offset) { if (begOfLine) { if (l_count == SIZE_MAX) { /* FIXME: Cleanup; leaks memory ATM. */ return EOVERFLOW; } ++l_count; if (tail) { tail->next = malloc(sizeof(*tail->next)); tail = tail->next; } else { head = tail = malloc(sizeof(*tail)); } if (!tail) { /* FIXME: Cleanup; leaks memory ATM. */ return ENOMEM; } tail->next = NULL; tail->line = wholeFile + offset; } if (wholeFile[offset] = '\n') { begOfLine = 1; } } *linesCount = l_count; *linesHead = *l_head; return 0; } These are just examples, you'll need to add the appropriate headers, and there are clearly some comments that demand more code, but either should get you started. Both have O(fileSize) runtime, which is the best I can imagine for arbitrary files. -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/
David C. Rankin wrote:
Listmates,
Slightly OT,
Probably better suited for the opensuse-programming list.
but if I read a text file into a single buffer, I can parse the buffer to find the number of lines by finding and counting the '\n' characters, but then how do I create a set of pointers (array, list, whatever) to the lines of text so I can access them?
You could construct a double linked list of entries such as this: struct line { struct line *next; struct line *prev; char text[1024]; }; You can use insque() and remque() to manipulate the list.
For example, I take ~/.bashrc and then its file size fread to read the entire file into a buffer before I work on the text. I want a way to get a pointer to the start of each line so I can then manipulate the lines as if I had read them into individual strings. I can use fmemopen, use the address of the buffer and then get the address of each '\n' and then add 1 to the pointer to get the start of the next line of text, but that just seems whacky.
You could use stat to determine the filesize, then allocate that amount of memory, read in the file, and use strtok() to parse it line by line. You could keep the address of each line in an array char* line[numlines]; /Per -- Per Jessen, Zürich (22.9°C) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 David C. Rankin wrote:
Listmates,
Slightly OT, but if I read a text file into a single buffer, I can parse the buffer to find the number of lines by finding and counting the '\n' characters, but then how do I create a set of pointers (array, list, whatever) to the lines of text so I can access them?
For example, I take ~/.bashrc and then its file size fread to read the entire file into a buffer before I work on the text. I want a way to get a pointer to the start of each line so I can then manipulate the lines as if I had read them into individual strings. I can use fmemopen, use the address of the buffer and then get the address of each '\n' and then add 1 to the pointer to get the start of the next line of text, but that just seems whacky.
Anybody have any thoughts on how to make this work?
Hmmm.... If you are reading a simple text format, reading the whole file in then scanning the imported data, initially seems a tad inefficient (you are effectively reading the file data twice, and then apparently using the second data read to perform further processing on the data a third time). Readline style operations both load the file and demarcate by the EOLN condition when it occurs and import the data at the same time, the line pointer structure can then be built dynamically and possibly any processing of the line performed dynamically as well. The contents of each line can of course be copied into an internal buffer. (If something can be done by a particular language it usually better to let the language do it rather than try and do it yourself. With GNU C this seems to implemented with the non-standard getline function which does allocate memory on an 'as required' basis according to the docs). Buffer style operations are usually better with binary data, or in situations where you are attempting to improve performance by reading data in chunks that reflect underlying physical data organisation or reading data asynchronously. The choice of approach really tends to depend on the balance between the speed of the read operation against speed of the processing operation, and the loop overhead. Also, in this case you are effectively implementing a variant of a read line style operation on a memory buffer which may be adding a little unrequired complexity. Error handling is likely to be bit more manageable as well in non-buffer based reading. Consider the following two Perlish psuedo code snippets, (the undefined function process can represent anything way of getting any additional info about line that is needed)... which roughly do what you seem to be intending to do... $count=0; @lines; open (IN,"<~/.bashrc"); while (my $line=<IN>) { $val1=process($line); $lines[$count++]={"Contents"=>$line, "Value 1"=>$val1,}; } close IN; for an array/list of hashes and $count=0; %lines; open (IN,"<~/.bashrc"); while (my $line=<IN>) { $val1=process($line); $lines{$count++}={"Contents"=>$line, "Value 1"=>$val1,}; } close IN; for a hash of hashes... in the first to access a line the format is $line[line number]->{Contents}; and the second $line{line number}->{Contents}; This reduces the amount of effort expended on writing the data structure management and allows one to concentrate on the program functionality. When you have things working as required at this level then one can think about porting to C and C based data structures. . - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAkqOhqYACgkQasN0sSnLmgJrQwCgoivoJZZjFJnJut7CAwz4IfCA I9cAoLSPjfhgB2MXizAMs5Pp4xENyKZB =gHWL -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (4)
-
Boyd Stephen Smith Jr.
-
David C. Rankin
-
G T Smith
-
Per Jessen