[opensuse] Bashism - Seeking input on filenames that contain non-printing control characters (examples)
All, An issue recently came up on StackOverflow.com regarding shell scripts related to the use of 'ls -opts' to populate a loop structure in a shell script. Suffice it to say *in sum*, it is good practice to avoid this use of ls and to search for an alternative solution to fit your needs. That being so, there are some cases where the loop needs data in a sort order fashion, easily provided by ls, but requiring tortured, piped multiple external calls otherwise. The tweenies (those that 'know not' and 'know not' that they 'know not') took the position that simply mentioning ls as a method to feed a loop automatically condemned you to roast in hell for eternity. (despite TLDP BASH HowTo being riddled with examples) Those that 'know', said wait a minute, it is a portability issue, and the problems protected against are just those instances where non-printing control characters are embedded in the filename, such as a newline, carriage-return or null-terminating character inside the filename itself. Taking the position that if you know the set of filenames at issue does not contain any embedded nonsense (like daily server logs, etc.), then there are no other hidden adverse effects from using ls in this capacity. Certainly you would not be condemned to hell or have your eyes plucked out by dragons if you did. So that brings up the question: "Just which files do have embedded control characters (intentionally, and not as the result of some horrible mistake)?" The only example I have uncovered thus far is the OSX 'icon' file that has a 'carriage-return' intentionally embedded as part of the filename. What others have you run across or can think of? I could think of no better or more experienced group to consult that the good old "brain-trust". So just how prevalent are these filenames with embedded control characters? -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
David C. Rankin wrote:
"Just which files do have embedded control characters (intentionally, and not as the result of some horrible mistake)?"
Generally speaking I'd expect to find nothing more than a 'tab' where someone meant a space. More paranoidly, I might see ^H where someone was trying overprint or even CR... In very special cases, I might see some of those space-control chars used in some sort of "README" type message encoded as the name(s) of files. BUT that's be more likely on some external storage media. More likely today, I'd use 001 first line 002 2nd line... etc... so they'd be sorted in text readable order. Unfortunately most of the control chars just don't do what you might want them to do across a wide variety of today's terminals... (like overprint)... ... Maybe "ESC" to embed some color code or other tty code?... so if you had an 'ls' w/o color turned on, you could have a directory "red green blue...etc" with each file in it's own color? How useful or widespread that would be would be another matter.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 08/31/2014 12:34 AM, David C. Rankin wrote:
All,
An issue recently came up on StackOverflow.com regarding shell scripts related to the use of 'ls -opts' to populate a loop structure in a shell script. Suffice it to say *in sum*, it is good practice to avoid this use of ls and to search for an alternative solution to fit your needs. That being so, there are some cases where the loop needs data in a sort order fashion, easily provided by ls, but requiring tortured, piped multiple external calls otherwise.
The tweenies (those that 'know not' and 'know not' that they 'know not') took the position that simply mentioning ls as a method to feed a loop automatically condemned you to roast in hell for eternity. (despite TLDP BASH HowTo being riddled with examples)
Those that 'know', said wait a minute, it is a portability issue, and the problems protected against are just those instances where non-printing control characters are embedded in the filename, such as a newline, carriage-return or null-terminating character inside the filename itself. Taking the position that if you know the set of filenames at issue does not contain any embedded nonsense (like daily server logs, etc.), then there are no other hidden adverse effects from using ls in this capacity. Certainly you would not be condemned to hell or have your eyes plucked out by dragons if you did.
So that brings up the question:
"Just which files do have embedded control characters (intentionally, and not as the result of some horrible mistake)?"
The only example I have uncovered thus far is the OSX 'icon' file that has a 'carriage-return' intentionally embedded as part of the filename. What others have you run across or can think of?
I could think of no better or more experienced group to consult that the good old "brain-trust". So just how prevalent are these filenames with embedded control characters?
Unfortunately, you didn't provide a concrete example. "Control characters" may not happen so often (or they do, however), but the point is that you mentioned that the output is processed further in a loop. For that, the shell must do some parsing again, and might fall into traps because of these file names. As an example, the most evil trap combined with the most usual case are blanks in the file name. $ f1=file1; f2='file two'; f3=file3 $ files="$f1 $f2 $f3" # bad example $ for f in $files ; do echo $f ; done # bad example file1 file two file3 Thus said, it's not a matter of which characters the script expect, but rather how you do the quoting. Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2014 01:22 AM, Bernhard Voelker wrote:
On 08/31/2014 12:34 AM, David C. Rankin wrote:
All,
An issue recently came up on StackOverflow.com regarding shell scripts related to the use of 'ls -opts' to populate a loop structure in a shell script. Suffice it to say *in sum*, it is good practice to avoid this use of ls and to search for an alternative solution to fit your needs. That being so, there are some cases where the loop needs data in a sort order fashion, easily provided by ls, but requiring tortured, piped multiple external calls otherwise.
The tweenies (those that 'know not' and 'know not' that they 'know not') took the position that simply mentioning ls as a method to feed a loop automatically condemned you to roast in hell for eternity. (despite TLDP BASH HowTo being riddled with examples)
Those that 'know', said wait a minute, it is a portability issue, and the problems protected against are just those instances where non-printing control characters are embedded in the filename, such as a newline, carriage-return or null-terminating character inside the filename itself. Taking the position that if you know the set of filenames at issue does not contain any embedded nonsense (like daily server logs, etc.), then there are no other hidden adverse effects from using ls in this capacity. Certainly you would not be condemned to hell or have your eyes plucked out by dragons if you did.
So that brings up the question:
"Just which files do have embedded control characters (intentionally, and not as the result of some horrible mistake)?"
The only example I have uncovered thus far is the OSX 'icon' file that has a 'carriage-return' intentionally embedded as part of the filename. What others have you run across or can think of?
I could think of no better or more experienced group to consult that the good old "brain-trust". So just how prevalent are these filenames with embedded control characters?
Unfortunately, you didn't provide a concrete example. "Control characters" may not happen so often (or they do, however), but the point <snip> Berny
I apologize Berny, I am looking for real world examples where idiots NORMALLY intentionally put control chars like '\n' '\r' '\0' inside the file name. Stupid stuff like: icon\rfileOSX1 # which apparently Apple does something similar too. That's what I need examples of. Any other examples where, while common-sense, wisdom, intelligence, and just good ole fair-play says we probably shouldn't play with disaster and put non-printing control codes (ASCII < 20) in filenames even though the standard allows us to do it technically -- and then somebody actually does it. I don't know of any more intentionally done other than the OSX icon file. So I'm looking for all(any) the other example of this going on. I don't think there are very many at all (and that is the point), there probably are no more than a couple corner-case possibilities you would every see one of these magical creatures, since they possibly exist (like "in-person voter fraud") we know have zealots going all over the BashSphere warning against Armageddon if a user makes use of `ls -rt | while read -r line; do echo $line; done` I want to get a handle of how rare these corner cases are. Like "Dodo bird rare" or like "snip" rare. That's the question. And it seems a modern day "snipe" hunt is now in order... -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (3)
-
Bernhard Voelker
-
David C. Rankin
-
Linda Walsh