[opensuse] Need some help with a script
I have a slight problem, i dont know enough about scripting! My daughter and a friend of hers has been "chatting" via a community page for almost 6 months, writing a collaborate story together. The problem with the youngsters is that they top-post... So trying to read the story in its completion is almost impossible without going crazy... The text is written in short two/three sentence groups all spaced with the same kind of header. its all saved in a long textfile (Some 300 pages worth) <cut> ---------- (This is 10 dashes...) Kitty9 said the following: <here is the answer> ---------- Son_of_sam said the following: <here be text> </cut> How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer> and so forth..? Any hints as to how, or where to teach myself the answer is appreciated. -- /Rikard Johnels
Rikard Johnels wrote:
How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer>
and so forth..? Any hints as to how, or where to teach myself the answer is appreciated.
Try this script: #!/bin/bash i=0 while read x; do if test "x$x" = "x----------" ; then i=$[i+1] read somebodysaid echo -e "$collected" > part$i.tmp collected="" else collected="$collected\n$x" fi done i=$[i+1] echo -e "$collected" > part$i.tmp for j in `seq $i -1 1`; do cat part$j.tmp done I have (sucessfully) tested it with this input: ---------- foo said: asdf1 asdf2 asdf3 ---------- bar said: rlkj1 rlkj2 rklj3 ---------- batz said: last1 last2 last3 The only tricky part is the "read somebodysaid" which simply swallows the "somebody said:" line that follows after the dashes. Plus the seq command with 3 parameters to make it count backwards. I think the whole script could also be replaced with a big pipe of "sed" or "awk" commands and a final "tac" to reverse the order. But that would involve more regexp-magic instead of straight forward imperative programming. And it may take longer to get right, even though the code is much shorter. Further reading: man test man echo man seq http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html Regards nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
I wrote:
I think the whole script could also be replaced with a big pipe of "sed"
Yes, it can: cat the_file | tr '\n' ' ' | sed -r 's/---------- [[:alpha:]]+ said:/EVILSEPERATOR/g; :found; s/EVILSEPERATOR (.+) EVILSEPERATOR (.+)/\2EVILSEPERATOR \1/;tfound;s/EVILSEPERATOR//' It will work with the example data that I provided in the last post, assuming it is saved as "the_file". Have a lot fun understanding it *g* nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 04 February 2009 02:25, nordi wrote:
I wrote:
I think the whole script could also be replaced with a big pipe of "sed"
Yes, it can:
cat the_file | tr '\n' ' ' | sed -r 's/---------- [[:alpha:]]+ said:/EVILSEPERATOR/g; :found; s/EVILSEPERATOR (.+) EVILSEPERATOR (.+)/\2EVILSEPERATOR \1/;tfound;s/EVILSEPERATOR//'
It will work with the example data that I provided in the last post, assuming it is saved as "the_file".
Have a lot fun understanding it *g* nordi
-- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null.
This works.. (sort of...) It removes all cr/lf tho.. So i get a VERY long text on one line.... almost 300.000 characters long.... -- /Rikard Johnels
Rikard Johnels wrote:
This works.. (sort of...) It removes all cr/lf tho.. So i get a VERY long text on one line.... almost 300.000 characters long....
It looked nice with my example data, but 300k letters in paragraph is certainly too much. To add nice paragraphs, pipe it through yet another sed command by appending | sed -r 's/(.{200}[.!?])\ */\1\n\n/g' to the previous sed command. This will create paragraphs of around 200 characters. Although, to have really _useful_ paragraphs, you'd need do it by hand anyways. Because paragraphs are also supposed to be a unit of meaning. Regards nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 03 February 2009 18:37, nordi wrote:
Rikard Johnels wrote:
How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer>
and so forth..? Any hints as to how, or where to teach myself the answer is appreciated.
Try this script:
#!/bin/bash i=0 while read x; do if test "x$x" = "x----------" ; then i=$[i+1] read somebodysaid echo -e "$collected" > part$i.tmp collected="" else collected="$collected\n$x" fi done
i=$[i+1] echo -e "$collected" > part$i.tmp
for j in `seq $i -1 1`; do cat part$j.tmp done
I have (sucessfully) tested it with this input:
---------- foo said: asdf1 asdf2 asdf3 ---------- bar said: rlkj1 rlkj2 rklj3 ---------- batz said: last1 last2 last3
The only tricky part is the "read somebodysaid" which simply swallows the "somebody said:" line that follows after the dashes. Plus the seq command with 3 parameters to make it count backwards.
I think the whole script could also be replaced with a big pipe of "sed" or "awk" commands and a final "tac" to reverse the order. But that would involve more regexp-magic instead of straight forward imperative programming. And it may take longer to get right, even though the code is much shorter.
Further reading: man test man echo man seq http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
Regards nordi
-- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null.
Cant get this to work at all. It starts, but just sits there doing nothing. -- /Rikard Johnels
Rikard Johnels wrote:
while read x; do
Cant get this to work at all. It starts, but just sits there doing nothing.
OK, should have added that you need to run "cat the_file | ./my_script", just as with my sed program. The "read x" reads from stdin, and will wait forever if it has to. Regards nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday February 5 2009, nordi wrote:
Rikard Johnels wrote:
while read x; do
Cant get this to work at all. It starts, but just sits there doing nothing.
OK, should have added that you need to run "cat the_file | ./my_script", just as with my sed program. The "read x" reads from stdin, and will wait forever if it has to.
While that works, we really shouldn't be encouraging such roundabout
ways of redirecting a program's standard input. That is what the
shells' less-than (<) operators are for. For the example above, the
canonical way to invoke it is this:
% my_script Regards
nordi Randall Schulz
--
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
For the example above, the canonical way to invoke it is this:
% my_script
Indeed, I guess my example qualifies as a useless use of cat ;)
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
I am the only user on this system, yet I still don't want to encourage myself to such dangerous uses of $PATH. I simply do not want one compromised account to be able to gain control over the other accounts. Before you ask: I have five accounts for myself, all for different purposes that require privilege separation. And I also have accounts on machines that I share with hundreds/thousands of other users, so why not keep my shells consistent and never include "./"? Regards nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday February 5 2009, nordi wrote:
Randall R Schulz wrote:
For the example above, the canonical way to invoke it is this:
% my_script
Indeed, I guess my example qualifies as a useless use of cat ;)
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
I am the only user on this system, yet I still don't want to encourage myself to such dangerous uses of $PATH.
And what is that danger?
...
Regards nordi
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Thu, 05 Feb 2009, Randall R Schulz wrote:
On Thursday February 5 2009, nordi wrote:
Randall R Schulz wrote: [..]
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
I am the only user on this system, yet I still don't want to encourage myself to such dangerous uses of $PATH.
And what is that danger?
echo 'echo rm -rf ~' > /tmp/ls && chmod a+x /tmp/ls (and as the same or a different user, cd to /tmp/ and type 'ls'. Unless you have an alias 'ls' set to an explicit '/bin/ls ...' you'd be screwed). Having '.' in your PATH is evil. And/or stupid. Put your scripts into ~/bin/ (as symlinks if you like). HTH & HAND, -dnh -- It's so nice to be insane, no one asks you to explain! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, David Haller wrote:
Hello,
On Thu, 05 Feb 2009, Randall R Schulz wrote:
On Thursday February 5 2009, nordi wrote:
Randall R Schulz wrote:
[..]
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
I am the only user on this system, yet I still don't want to encourage myself to such dangerous uses of $PATH.
And what is that danger?
echo 'echo rm -rf ~' > /tmp/ls && chmod a+x /tmp/ls
And where are these things going to come from? Spontaneous data errors?
(and as the same or a different user, cd to /tmp/ and type 'ls'. Unless you have an alias 'ls' set to an explicit '/bin/ls ...' you'd be screwed).
Having '.' in your PATH is evil. And/or stupid.
And as I said, if that is true, having _anything_ in your PATH is <choose-your-adjective>.
Put your scripts into ~/bin/ (as symlinks if you like).
Which is equally susceptible to manipulation in the way you suggest other directories are.
HTH & HAND,
I'm having a bad day, but it's my own fault 'cause I was careless about something. It'll probably be better from now on (Science Friday starts in an hour). But it does not help. It's still just the same old paranoid fantasy.
-dnh
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Randall R Schulz wrote:
On Thursday February 5 2009, nordi wrote:
Randall R Schulz wrote:
And what is that danger?
In short oneself, only fools or liars claim they have never made a mistake (and I do not think you are either of these), and one bit of minor protection gives you a little chance of reflecting on a decision before flushing everything down the toilet... it will not stop you doing it, but gives the comfort of knowing it is done with intention :-)
...
Regards nordi
Randall Schulz
- -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkmNgS8ACgkQasN0sSnLmgLg1wCcDA+cDKOTA4+IMSm4CcVRfjQU NL0Ani8sJtWSRIg7E9JzAKEEIYJttfLW =RyqQ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
G T Smith wrote:
In short oneself, only fools or liars claim they have never made a mistake
Don't remind me. We just emerged from eight-long-years of winter under a president that never made a mistake.... -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "Randall R Schulz"
On Thursday February 5 2009, nordi wrote:
Rikard Johnels wrote:
while read x; do
Cant get this to work at all. It starts, but just sits there doing nothing.
OK, should have added that you need to run "cat the_file | ./my_script", just as with my sed program. The "read x" reads from stdin, and will wait forever if it has to.
While that works, we really shouldn't be encouraging such roundabout ways of redirecting a program's standard input. That is what the shells' less-than (<) operators are for. For the example above, the canonical way to invoke it is this:
% my_script
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
Don't teach bad habits. It's always bad to have . in path. You are just begging to get raped. It's not a matter of trusting other users on the system. It's maybe possibly arguable for unpriviledged users but really even that is dumb. *anything* might be written to your current dir any time by any process you run. The extra time I spend typing "./" does not compare to even one time of spinning my wheels wondering why something is behaving so bizarre, having forgotten to check if maybe there is something in $PWD screwing up basic assumptions that should be safe. That are _supposed_ to be safe. The alternative is having to specify the full explicit path to every binary, and shell built-in, you ever run, not just by hand but in every script and even compiled into many binaries. It's one of those things that you as the owner of your box may do, but it's really irresponsible to ever even suggest to anyone else they do so. Anyone working with computers should have habits based on being explicit and unambiguous anyways. You should _want_ to be typing "./myscript" instead of "myscript" when you explicitly want to run some transient local temp file. If it's not a transient temp file and will be used more than once, then you can move it into ~/bin which can be in PATH and have your convenience just fine. There is absolutely no excuse for . in PATH. -- Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday February 5 2009, Brian K. White wrote:
----- Original Message ----- From: "Randall R Schulz"
...
% my_script
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
Don't teach bad habits. It's always bad to have . in path. You are just begging to get raped. It's not a matter of trusting other users on the system.
That's 99.9% nonsense. RRS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "Randall R Schulz"
On Thursday February 5 2009, Brian K. White wrote:
----- Original Message ----- From: "Randall R Schulz"
...
% my_script
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
Don't teach bad habits. It's always bad to have . in path. You are just begging to get raped. It's not a matter of trusting other users on the system.
That's 99.9% nonsense.
Why? Because you've been a lucky idiot? So have I. I never bother to lock my car doors at home, work or my usual eating places. In fact in summer I leave the top right off of one. And neither vehicle has any other form of security like any alarm. 20 years and never a problem so far. So I should probably tell my neice it's silly to lock your car when you leave it. In fact I should probably find some fairly populous newsgroup for teenagers and in the voice of apparent wisdom of a 20 year driver, advise them all that it's silly to lock their cars. It's only partly a matter of trust or security or fear of hackers anyways. It's basic robust habits. The more wildcard variables, the less reliable the system. Whatever the system not just computer operating systems. When things are left to chance, things WILL happen. Some percentage of time, to some percentage of people, with varying consequences of varying severity. When you advise others what to do, you take on an obligation to give only sound advise, not slap-dash ideas that are full of holes. Or at least to TRY honestly. Anyone can simply be wrong of course, but there's no excuse for anything like "Eh it'll probably be fine don't worry about it, it's slightly more convenient this way and nothing ever happened to me, well, that I know of anyways... And hey if I can't tell if it ever hurt me, it must not have, right? And therefor, it will never hurt anyone and so all that world of systems analysts who determine best practices, well they are simply all silly." What's that? Only talking about user accounts? not root? Tell that to the business owners who hire small independant "consultants" who learn crap advise like this at home, form lousy sloppy habits like this on their user accounts, and so do the same things as root because thats what they know and like thanks to you. Then they go around polluting every machine they touch for years. Dot in path is not going to kill babies, no. But it's one of many examples of unrobust system design. The more people adopt that bad practice, and the more other such error-increasing procedures they adopt, the crappier everything gets. The overall overhead of inefficiency goes up as mistakes and mishaps happen more often because more people are doing things in less robust ways. DOS had dot-in-path, and had the same design philosophy expressed everywhere else in the system, and now every windows box must waste a significant portion of system resources, not doing any useful or productive work, just to run the antivirus and anti adware and anti phishing scanner 24/7 and run every tiny memory and disk transaction through them, or else suffer the even huger waste of getting a virus, and in actuality, STILL getting some virus ANYWAYS now and then. And as a society we must all waste significant resources in the form of whole industries of programmers and the entire supporting architecture, all that potential talent soaked up not actually producing anything new or advancing anything, just trying to keep windows boxes alive, just to tread water and keep the status quo. What a crying shame collossal waste. What if your accountant, lawyer, doctor, etc.., using only his user account not root, who isn't a seasoned unix admin who knows enough to reject advice like that, loses that important tax data, test results, etc.. Oh, you're too smart to leave anything important to you in only one place and in someone elses hands? That's nice, what about all their other clients? Cripes theres no end to the problems with that kind of thinking but I can't write forver... -- Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday February 5 2009, Brian K. White wrote:
(I cannot refrain from remarking that it's silly not to put the current working directory in your PATH variable unless you don't trust the other users of your system.)
Don't teach bad habits. It's always bad to have . in path. ...
That's 99.9% nonsense.
Why? Because you've been a lucky idiot?
Apart from the insults, the inane diatribe and the offensive analogies, what is the danger? Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
what is the danger?
Apart from the obvious security aspects you also risk having an executable in ./ that will modify the behavior of another program you are running. E.g. if you run SuSEconfig it uses the "cat" program, and your ./ happens to contain a cat program (that does something entirely different) then SuSEconfig would mysteriously misbehave. Sure, this is quite unlikely to happen. But, as Brian already pointed out, it is yet another possible failure point. One that is very easy to avoid and thus should always be avoided. One example: If you run the "postfix" command, you would expect /usr/bin/postfix to be run. But if your $PWD happens to be /etc/init.d/, it may also run /etc/init.d/postfix (or may not, depending on the order in $PATH). Same thing for xfs, xdm, ypbind and probably others. Regards nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, nordi wrote:
Randall R Schulz wrote:
what is the danger?
Apart from the obvious security aspects you also risk having an executable in ./ that will modify the behavior of another program you are running. ...
These endless, arcane and tortured hypothetical scenarios beggar belief. No directory is more vulnerable to such manipulation than any other, so your $HOME/bin, e.g., can equally well be targeted. If you're serious about these so-called threats, you should not have a PATH variable set at all. But by all means, make your life hard, if you think it will help you. I have a lot of special-purpose scripts that live in the directories in which they're used, and I'm not going to be typing ./ all the time.
Regards nordi
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "Randall R Schulz"
On Friday February 6 2009, nordi wrote:
Randall R Schulz wrote:
what is the danger?
Apart from the obvious security aspects you also risk having an executable in ./ that will modify the behavior of another program you are running. ...
These endless, arcane and tortured hypothetical scenarios beggar belief.
hahahha wow, that's exactly what every non-programmer, non-mechanic non-electrical engineer, non-db-admin, non-accountant, non-doctor, non-anything-remotely-techy person says about every techy topic once you start to get even slightly detailed. Things don't work by accident. And they don't work because engineers "feel" that this or that isn't "likely". Things only work by dint of lot's of people all thinking about exactly those "tortured hypothetical scenarios" because, duh, they aren't hypothetical at all. They ALL happen. It doesn't matter if a thing is likely or unlikely. It only matters if it's possible or not possible. The world, at least todays world, only even works at all because enough of the people that make things work, try to always think robustly. Enough so that in fact lots of sloppy thinkers can actually get away with a lot. If a given system involves 4 parts, and 3 of them were designed robustly, the 4th guy can actually get away with turning out some crap. The system as a whole will work well enough because the _other_ guy didn't think in terms of likely or unlikely. He didn't think in terms of "this would be some really tortured hypothetical scenario so why worry about it?" In fact, a fair bit of the time as long as at least one part of a system is solid and does the sane thing most of the time, sometimes that's enough to make the whole system "good enough". So a lot of people get into positions where they are turning out crap and don't even know it. But do you actually want to be that guy? Knowingly? Further, it's irresponsible to go further and actually advise to others that they don't have to be careful either just because you've discovered that you can get away with it. The question isn't "What will happn?" It's "What have I allowed?" And "Have I set things up so that WHEN they fall over, they fall in the least harmful direction?" Baby talk programming example. You are counting in a loop from 1 to 10, in incriments of 1. It's a dead simple loop, can't possibly screw up in any weird way. Never the less at the exit point of your loop, you don't check for n=10, you check for n>9 or n>=10. It doesn't matter in the slightest how simple the loop is or that it will never ever mistakenly miss a beat and exceed 10 and thus spin away forever counting up to infinity. It doesn't matter, you still ALWAYS test for greater than or equal to 10, not just equal to 10. Because that is the robust way and it costs you little or nothing to do it the solid way vs the risky way. If it's not OK for your program to count from 11 to infinity, then you don't allow it. It's not a matter of thinking up tortured hypothetical scenarios. You don't have to say "What if n were 12? Ok I accounted for 12. Now, what if n were... 327? OK dealt with 327. Now, what if n were... " No you just think of one thing, is n less than 10 or not? I don't care what it is outside that range. Dot in path is like that. The question isn't who might place a file in a place where you will end up executing it, nor is the question what might they put there. The question is can anyone do anything like that. Without dot in path, they can't. You realise of course that "they" is just as much "you" as anyone else. I've watched so many people get tripped by the simple case of "test". They forget that test is a system binary, usually not actually called because usually the shell builtin gets used, but, it still exists as a binary for a reason and sometimes gets used. So they're working on some script and of course they name it, or it's sample data, "test", which ends up screwing up all manner of far reaching things because that's such a basic command peppered throughout so many scripts in the system. It's like replacing the letter "e" with something else throughout. what would that screw up? I don't know exactly either but I know it's a lot. And I don't have to actually try it to know it. I don't have to actually have the "dead baby" to point at explicitly. In fact, what I said about everyone else being good so one guy can get away with being a walking disaster would probably kick in here. Everyone _else_ who writes a script that's part of the system, generally ensures somehow that the various commands used in the script actually come from some known place. They set an overriding PATH at the top, they explicitly use the shell builtin command, they explicitly call the external commands by full absolute path, sometimes they go so far as to supply their own commands with the script, or impliment the equivalents of some commands entirely within the script itself, _something_. They only suffer all that inconvenience becaause they need to, because untill they did, it kept blowing up. Not because that's how they wanted to spend all those nights & weekends. Precisely because everyone else considers those "tortured hypthetical scenarios" is why you could actually probably get away with not considering them yourself and most parts of the system would still work. I suppose that you interpret this as supporting your argument? That you don't have to be careful because everyone around you is being careful for you? Back to the simple programming example. 2 years passes and that dead simple loop has grown into a larger more complicated routine that does a bit of jumping around, and, you are no longer the only programmer futzing with it either. Other people, of questionable skill or powers of observation, have been and will continue to be modifying it. Sooner or later that counter is going to get screwed up and n is going to jump by more than 1 in a single iteration and it's going to skip right over that magic threshhold value you were looking for. But, no problem, it should have been impossible to ever reach 11, but by testing for >= you don't care _what_ value n jumps to, you still handle it and your program, which might be part of an aircraft autopilot, or a medical monitor or dosimeter, or a parking meter timer, or a web site billing calculator, or _anything_, works. It's not a tortured hypothetical scenario, it's exactly what happens all day every day a zillion times a second in life, in every clock radio and air-bag deployment system and not a few doorknobs. Now maybe you want to say this last scenario doesn't apply, because you don't now and will never have other users on your system? Much flawed logic there. But the two biggest are: 1 - Every time you run any binary or script that you didn't personally write, you are essentially letting other users on your system. 2 - You didn't just do something on your own system but suggested in a public place frequented by new users who don't know better, they they do the same. To use the programming analogy again, you told other programmers to write poor code, some of whom will use this advice in places where it will cause a disaster. Even if the disaster is just that they show me an example of their code and I don't hire them because of it. That's a pretty big disaster for at least somebody. And I sure wouldn't let any fan of dot-in-path near my boxes at least until I cured them of that and established are they really that much of an idiot or did they merely fall prey to some bad advice and haven't yet aquired the hard knocks to correct that particular lack of thinking things through. Lastly, "every other dir on the system has the same problem" is absolutely cotton-headed thinking. Without dot-in-path, path is a defined and fixed set of a very few directories, all of which have basic permissions such that users can not write new files there, nor alter any files that are already there. "." on the other hand includes every directory on the whole system, including ones not only your current user has permission to write to, but ones any user even user "nobody" used by the web server, has permission to write to. What, you never ever once cd'd to /tmp and then ran a command like ls or cat ir vi etc... without first making sure that there wasn't a file in /tmp that had the same name? Even if you were that perfect, no one else is, and even if everyone else was, it's still absolutely inexcusably poor design to rely on that. This discussion has of course outgrown "dot in path". Like I tried to express before with the "doesn't kill babies" quip, dot in path is just one item. No more harmful than any other, but no less. What I'm talking about is the thinking that leads to "dot in path" also leads to a zillion other similar bad ideas, and cumulatively that is very bad. -- Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, Brian K. White wrote:
...
Apart from the obvious security aspects you also risk having an executable in ./ that will modify the behavior of another program you are running. ...
These endless, arcane and tortured hypothetical scenarios beggar belief.
...
Wow. You sure do like to hear yourself talk, don't you? None of the dot-in-PATH doomsayers have explained how their scenarios exclude the directories that _are_ in PATH from being targeted in exactly the same way. In fact, it makes a hell of a lot more sense to use one of the various "bin" directories than some other random directory in which someone may never issue a command for a given hijacked program.
-- Brian K. White
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Saturday 07 February 2009 01:18:30 Randall R Schulz wrote:
None of the dot-in-PATH doomsayers have explained how their scenarios exclude the directories that _are_ in PATH from being targeted in exactly the same way. In fact, it makes a hell of a lot more sense to use one of the various "bin" directories than some other random directory in which someone may never issue a command for a given hijacked program.
The directories which normally are in the path require you to be root to write to them. And if someone already is root, it doesn't really matter what you have in your path. At that point you're not talking about targets anymore This is about what a normal user can do to you, or what you accidentally can do to yourself. Anders -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, Anders Johansson wrote:
On Saturday 07 February 2009 01:18:30 Randall R Schulz wrote:
None of the dot-in-PATH doomsayers have explained how their scenarios exclude the directories that _are_ in PATH from being targeted in exactly the same way. ...
The directories which normally are in the path require you to be root to write to them.
What about $HOME/bin (for non-root users)?
...
This is about what a normal user can do to you, or what you accidentally can do to yourself.
And for that very large portion of us whose Linux systems are essentially personal computers with only one human user? Are there _any_ documented instances of this supposed exploit actually being carried out?
Anders
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Fri, 06 Feb 2009, Randall R Schulz wrote:
None of the dot-in-PATH doomsayers have explained how their scenarios exclude the directories that _are_ in PATH from being targeted in exactly the same way.
What permissions has /bin? /usr/sbin? ~/bin? Can user foo put a script into /home/bar/bin? Please just refrain from mentioning dot-in-path here or elsewhere. What you do on your machine is your problem, but don't mention it. Thank you. -dnh -- dochdoch ich bin immer schon echt gewesen...echt unmöglich, oder echt bekloppt, oder echt bescheuert.... -- T. A. Baetzig -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, David Haller wrote:
Hello,
...
Please just refrain from mentioning dot-in-path here or elsewhere. What you do on your machine is your problem, but don't mention it.
Let's be clear: You have no authority to declare anything off-topic on this list if it's not already off-topic based on the list's charter. I will say what I choose to say about relevant topics.
Thank you.
For what, exactly?
-dnh
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Fri, 06 Feb 2009, Randall R Schulz wrote:
On Friday February 6 2009, David Haller wrote:
Please just refrain from mentioning dot-in-path here or elsewhere. What you do on your machine is your problem, but don't mention it.
Let's be clear: You have no authority to declare anything off-topic
I have not declared it off-topic. I did ask you to refrain from telling others to put '.' in PATH. For all I care, you can do whatever you like on your box. Shoot yourself in the foot as much as you like, suit yourself. Just don't tell others, here or elsewhere, to do the same. Savvy?
Thank you.
For what, exactly?
For refraining from telling others to put '.' in PATH, here or elsewhere. -dnh PS: About my authority you'll have to ask others. I haven't claimed any. Anyways, I've been subscribed to this list for many years, I just didn't have the nerves to read much, and even less to write, for the last 6 years or more. -- Life sucks, and then you die. And then it still sucks. -- Georgia 'George' Lass, Dead Like Me -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, David Haller wrote:
Hello,
...
I have not declared it off-topic. I did ask you to refrain from telling others to put '.' in PATH.
For all I care, you can do whatever you like on your box. Shoot yourself in the foot as much as you like, suit yourself. Just don't tell others, here or elsewhere, to do the same. Savvy?
I will give the advise I decide to give. I've had dot in my PATH for 32 years and my foot remains free of bullet holes.
Thank you.
For what, exactly?
For refraining from telling others to put '.' in PATH, here or elsewhere.
I will do no such thing.
-dnh
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Fri, 06 Feb 2009, Randall R Schulz wrote:
On Friday February 6 2009, David Haller wrote:
I have not declared it off-topic. I did ask you to refrain from telling others to put '.' in PATH.
For all I care, you can do whatever you like on your box. Shoot yourself in the foot as much as you like, suit yourself. Just don't tell others, here or elsewhere, to do the same. Savvy?
I will give the advise I decide to give.
I've had dot in my PATH for 32 years and my foot remains free of bullet holes.
You're lucky.
Thank you.
For what, exactly?
For refraining from telling others to put '.' in PATH, here or elsewhere.
I will do no such thing.
*sigh* *shakes head* *frelling ...* -dnh, *wanders off to draw up a "Don't try this!" sign, muttering, then hollering "Anyone got a clue-mallet around here???"* -- This is exactly how the World Wide Web works: the HTML files are the pithy descriptions on paper tape, and your Web browser is Ronald Reagan. -- Neal Stephenson, "In the Beginning was the Command Line" -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday February 6 2009, David Haller wrote:
...
I've had dot in my PATH for 32 years and my foot remains free of bullet holes.
You're lucky.
No. I am intelligent.
Thank you.
For what, exactly?
For refraining from telling others to put '.' in PATH, here or elsewhere.
I will do no such thing.
*sigh* *shakes head* *frelling ...*
Since we're asking each other for things, I'd ask you to drop the passive aggressive stuff.
-dnh, *wanders off to draw up a "Don't try this!" sign, muttering, then hollering "Anyone got a clue-mallet around here???"*
Perhaps you need to find something better to do than perpetuate silly myths with such vehemence? Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
I have not declared it off-topic. I did ask you to refrain from telling others to put '.' in PATH.
For all I care, you can do whatever you like on your box. Shoot yourself in the foot as much as you like, suit yourself. Just don't tell others, here or elsewhere, to do the same. Savvy?
I will give the advise I decide to give.
I've had dot in my PATH for 32 years and my foot remains free of bullet holes.
Ahh so that means it's not wrong? I have dirty underwear too. I am far too willing to solve permissions problems by just putting "umask 0" at the top of a script or as the first part of a system() command. And I'm not as thorough as I really should be about vetting and sanitizing any strings or values that were ultimately supplied by a user or other unpredictable outside source before I use them as part of commands I'm running. I have been doing so since day one and have never been burned either. But I'm still never going to tell anyone else it's a good way to operate, nor am I "can't help being amused by people who say it's wrong" Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Sat, 07 Feb 2009, Brian K. White wrote: [>Randall R Schulz wrote:] [>> David Haller wrote:]
I have not declared it off-topic. I did ask you to refrain from telling others to put '.' in PATH.
For all I care, you can do whatever you like on your box. Shoot yourself in the foot as much as you like, suit yourself. Just don't tell others, here or elsewhere, to do the same. Savvy?
I will give the advise I decide to give. ^ *blam*
I've had dot in my PATH for 32 years and my foot remains free of bullet holes.
Ahh so that means it's not wrong?
I have dirty underwear too. I am far too willing to solve permissions problems by just putting "umask 0" at the top of a script or as the first part of a system() command. And I'm not as thorough as I really should be about vetting and sanitizing any strings or values that were ultimately supplied by a user or other unpredictable outside source before I use them as part of commands I'm running.
I have been doing so since day one and have never been burned either.
In a similar vein, up until 2 weeks ago, for over 15 years I have had HD-drives lose sectors, but never die "just like that". But what do I know. 2 weeks ago, two 500 GB drives died on me almost simultaneously.
From the first one (system+data), I could still copy the data, the data on the second drive I just lost. The latter drive is still detected at boot (if I reconnect it), but won't respond (SATA link will not establish)... Go figure.
So, not having been "burned", "shot", or "bitten" by whatever, for however how long, doesn't mean shit. Unless you have proved[see sig] *and* tried that you can not be affected. Oh, and have I done/grown "dirty" stuff on my (main) system[1]. The positive side (i.e. maintaining the system) of what I do with that system has even been verbed on the suse-linux list. I tend to keep quiet about the "bad" things I do ;) I don't tell other people to do the same. Just that, with a metric buttload of "if"s, things may be possible. And . in PATH is evil. As is anything but 'ROOT_USES_LANG=no' in /etc/rc.config, ahhh, /etc/sysconfig/language nowadays :) And "CWD_IN_ROOT_PATH=..." shouldn't even exist. -dnh, takes a note to file that last one as a bug PS: Brian: please attribute quotes [1] $ rpm -qa --last | tail -n 1 bc-1.04-74 Mon 16 Aug 1999 07:19:26 -- Beware of bugs in the above code; I have only proved it correct, not tried it. - Donald Knuth -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Friday 06 February 2009 08:11:09 pm Randall R Schulz wrote:
On Friday February 6 2009, David Haller wrote: ...
For refraining from telling others to put '.' in PATH, here or elsewhere.
I will do no such thing.
Randall, The dot in PATH is just potentialy dangerous. You can be aware of rules how to work with it and aleviate danger, but many of your readers in a public forum like this lack experienced and can hurt themslves. Advocating something, that is considered by majority of experienced users as not good practice, without explanation what you do to diminish potential negative effects, is not fair to new users. I must admit that, as they are usually comming from something insecure by design, dot in a path is not so big problem. PS. I use default, not dot in the path. -- Regards, Rajko -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Can we PLEASE let all this drop from THIS particular thread??? Start the war somewhere else! I just wanted a short help with that script, and nowe we are flying banners of do's and dont's of administration of a system?? Come on guys and girls!! -- /Rikard Johnels
----- Original Message -----
From: "Randall R Schulz"
On Friday February 6 2009, David Haller wrote:
Hello,
...
Please just refrain from mentioning dot-in-path here or elsewhere. What you do on your machine is your problem, but don't mention it.
Let's be clear: You have no authority to declare anything off-topic on this list if it's not already off-topic based on the list's charter. I will say what I choose to say about relevant topics.
It's not that it's off topic, it's that you are telling people to do something wrong. You don't agree? Well, you're wrong. Not because I say so. Hell I'm nobody. I'm merely smart enough to know that if I'm not easily impressed by most people, and some people who make MY head spin worked out basics like this ages ago, and the relevent context hasn't changed since then, then even if I DIDN'T happen to understand the the reason for this particular piece of advice, I'd still at least realise that not understanding the problem does not mean there is no problem nor does it mean I actually know better than they, and that what I don't know sure can hurt me. In fact generally things I don't know turn out to be the things that hurt the worst. Before granting yourself the right to contest generally accepted wisdom, you have to actually understand the problem so well that you can say exactly how your wisdom is greater. How you have thought of things and accounted for things that no one else has yet. So, you don't get to ask "What does it hurt?" in this case. Instead quite the opposite, you are obligated to prove that it doesn't hurt. That your way yields better performance and fewer errors, or at least that it in no way reduces performance or increases errors or risk of errors. You can not do that in this case because it is simply not true. You really removed yourself from the discussion with the tortured hypothetical scenarios line. That proved in one succinct elegant efficient little phrase that you don't understand the principles your are proposing to overthrow. -- Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "nordi"
Rikard Johnels wrote:
while read x; do
Cant get this to work at all. It starts, but just sits there doing nothing.
OK, should have added that you need to run "cat the_file | ./my_script",
Useless use of cat award! http://partmaps.org/era/unix/award.html -- Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "Rikard Johnels"
I have a slight problem, i dont know enough about scripting!
My daughter and a friend of hers has been "chatting" via a community page for almost 6 months, writing a collaborate story together. The problem with the youngsters is that they top-post... So trying to read the story in its completion is almost impossible without going crazy...
The text is written in short two/three sentence groups all spaced with the same kind of header. its all saved in a long textfile (Some 300 pages worth)
<cut>
---------- (This is 10 dashes...) Kitty9 said the following:
<here is the answer>
---------- Son_of_sam said the following:
<here be text>
</cut>
How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer>
and so forth..? Any hints as to how, or where to teach myself the answer is appreciated.
I think I would do this in a few stages. Stage 1, loop through the file line by line and watch for the dashes, start by opening a file named 999999.txt for every line in the file, do: if it is dashes, decriment the file name and write a line to create & start a new file. if it's not dashes, write (append) the line to file. loop Here is awk code to do that: split.awk: ---snip--- BEGIN { N=999999 F=N".txt" print "---------- "F > F } ($1=="----------"){ fflush(F) close(F) N=N-1 F=N".txt" print $0" "F > F next } { print $0 >> F } ---snip--- Run it like so: awk -f split.awk input.txt Now you have several hundred files with numerical names in the correct order. Not starting at 0 or 1, but the lowest number is the beginning and the highest number is the end. Stage 2, cat the files back together in order. Make sure you don't have any other files in the current temp/working directory that would match the name pattern '[0-9]*.txt' (starts with a number and ends with .txt), just the ones that the script generated. Then: find ./ -name '[0-9]*.txt' |sort -n |xargs cat >output.txt That's it. output.txt is your final result. I actually tested all this with a dummy set of 5000 files, like so: Generate the backwards source file: n=5000 ;while [ $((--n)) -gt 0 ] ;do echo -en "----------\n1 blah\n2 ${n}\n3 b lah\n4 blah\n" ;done >src.txt That generates: ---------- 1 blah 2 5000 3 blah 4 blah ---------- 1 blah 2 4999 3 blah 4 blah ---------- 1 blah 2 4998 3 blah 4 blah ... This way when re-assembled I can tell if it did the right thing. Each block includes text that shows what order the blocks should be in, and each block has a pattern that would show if the lines within a block were moved around. You want the blocks reordered, but you want the lines within a given block preserved as they are. It worked perfect. Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 03 February 2009 20:00, Brian K. White wrote:
----- Original Message ----- From: "Rikard Johnels"
To: Sent: Tuesday, February 03, 2009 11:15 AM Subject: [opensuse] Need some help with a script I have a slight problem, i dont know enough about scripting!
My daughter and a friend of hers has been "chatting" via a community page for almost 6 months, writing a collaborate story together. The problem with the youngsters is that they top-post... So trying to read the story in its completion is almost impossible without going crazy...
The text is written in short two/three sentence groups all spaced with the same kind of header. its all saved in a long textfile (Some 300 pages worth)
<cut>
---------- (This is 10 dashes...) Kitty9 said the following:
<here is the answer>
---------- Son_of_sam said the following:
<here be text>
</cut>
How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer>
and so forth..? Any hints as to how, or where to teach myself the answer is appreciated.
I think I would do this in a few stages. Stage 1, loop through the file line by line and watch for the dashes, start by opening a file named 999999.txt for every line in the file, do: if it is dashes, decriment the file name and write a line to create & start a new file. if it's not dashes, write (append) the line to file. loop
Here is awk code to do that:
split.awk: ---snip--- BEGIN { N=999999 F=N".txt" print "---------- "F > F } ($1=="----------"){ fflush(F) close(F) N=N-1 F=N".txt" print $0" "F > F next } { print $0 >> F } ---snip---
Run it like so: awk -f split.awk input.txt
Now you have several hundred files with numerical names in the correct order. Not starting at 0 or 1, but the lowest number is the beginning and the highest number is the end.
Stage 2, cat the files back together in order.
Make sure you don't have any other files in the current temp/working directory that would match the name pattern '[0-9]*.txt' (starts with a number and ends with .txt), just the ones that the script generated. Then:
find ./ -name '[0-9]*.txt' |sort -n |xargs cat >output.txt
That's it. output.txt is your final result.
I actually tested all this with a dummy set of 5000 files, like so: Generate the backwards source file: n=5000 ;while [ $((--n)) -gt 0 ] ;do echo -en "----------\n1 blah\n2 ${n}\n3 b lah\n4 blah\n" ;done >src.txt
That generates: ---------- 1 blah 2 5000 3 blah 4 blah ---------- 1 blah 2 4999 3 blah 4 blah ---------- 1 blah 2 4998 3 blah 4 blah ...
This way when re-assembled I can tell if it did the right thing. Each block includes text that shows what order the blocks should be in, and each block has a pattern that would show if the lines within a block were moved around. You want the blocks reordered, but you want the lines within a given block preserved as they are. It worked perfect.
Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
The script just gives one file. 99999.txt, and then exits. -- /Rikard Johnels
Hello, On Tue, 03 Feb 2009, Rikard Johnels wrote:
The text is written in short two/three sentence groups all spaced with the same kind of header. its all saved in a long textfile (Some 300 pages worth)
<cut> ---------- (This is 10 dashes...) Kitty9 said the following:
<here is the answer>
---------- Son_of_sam said the following:
<here be text>
</cut>
How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer>
and so forth..? Any hints as to how, or where to teach myself the answer is appreciated.
==== transform.awk ==== #!/usr/bin/gawk -f BEGIN { RS="----------"; ors=RS; } { t[NR]=$0; } END { for(i=NR; i>0; i--) { print ors t[i]; } } ===== Usage: gawk -f transform.awk input.txt > output.txt It's not perfect, but does most of the job. HTH, -dnh -- Stef, XP is like democracy and governments. It's the worst Windows version out there, except for all the others. -- A.J. (userfriendly, id=20070928) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 04 February 2009 06:25, David Haller wrote:
Hello,
On Tue, 03 Feb 2009, Rikard Johnels wrote:
The text is written in short two/three sentence groups all spaced with the same kind of header. its all saved in a long textfile (Some 300 pages worth)
<cut> ---------- (This is 10 dashes...) Kitty9 said the following:
<here is the answer>
---------- Son_of_sam said the following:
<here be text>
</cut>
How would i grep the text and reconstruct it from bottom to top so i can get <here be text> <here is the answer>
and so forth..? Any hints as to how, or where to teach myself the answer is appreciated.
==== transform.awk ==== #!/usr/bin/gawk -f BEGIN { RS="----------"; ors=RS; } { t[NR]=$0; } END { for(i=NR; i>0; i--) { print ors t[i]; } } =====
Usage: gawk -f transform.awk input.txt > output.txt
It's not perfect, but does most of the job.
HTH, -dnh
-- Stef, XP is like democracy and governments. It's the worst Windows version out there, except for all the others. -- A.J. (userfriendly, id=20070928)
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy... -- /Rikard Johnels
Rikard Johnels wrote:
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy...
Yep, sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' < sedtest > outfile -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
David C. Rankin wrote:
Rikard Johnels wrote:
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy...
Yep,
sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' < sedtest > outfile
Oops 'sedtest' above should be 'infile'. 'sedtest' was just the file I created from your email to test my sed script before posting ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Thu, 05 Feb 2009, David C. Rankin wrote:
Rikard Johnels wrote:
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy...
Yep,
sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' < sedtest > outfile
I can't believe it. a) you don't have to output the '---...' lines from awk b) you can remove the "said..." stuff in awk ==== #!/usr/bin/gawk -f BEGIN { RS="\n*----------\n+"; # set the Record Seperator ORS="\n"; # set Output Record Seperator } { # replace the first "said..." of the block with the ":" sub(" said the following:", ":", $0); # store the block in the array of blocks t[NR]=$0; } END { # output blocks in reversed order for(i=NR; i>=0; i--) { print ORS t[i]; } } ==== HTH, -dnh -- Door: Something a cat wants to be on the other side of -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
David Haller wrote:
Hello,
On Thu, 05 Feb 2009, David C. Rankin wrote:
Rikard Johnels wrote:
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy...
Yep,
sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' < sedtest > outfile
I can't believe it.
a) you don't have to output the '---...' lines from awk b) you can remove the "said..." stuff in awk
==== #!/usr/bin/gawk -f BEGIN { RS="\n*----------\n+"; # set the Record Seperator ORS="\n"; # set Output Record Seperator } { # replace the first "said..." of the block with the ":" sub(" said the following:", ":", $0); # store the block in the array of blocks t[NR]=$0; } END { # output blocks in reversed order for(i=NR; i>=0; i--) { print ORS t[i]; } } ====
HTH, -dnh
It's a political script, it can't all be done in one department, that would be too easy and too efficient;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Thu, 05 Feb 2009, David C. Rankin wrote:
David Haller wrote:
On Thu, 05 Feb 2009, David C. Rankin wrote:
Rikard Johnels wrote:
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy...
Yep,
sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' < sedtest > outfile
I can't believe it.
a) you don't have to output the '---...' lines from awk b) you can remove the "said..." stuff in awk [..] It's a political script, it can't all be done in one department, that would be too easy and too efficient;-)
Then the other solutions show a distinct lack of using perl. Uselessly (and multiply) using perl in pipelines does wonders for inefficiency. The above could be replaced by: cat file | perl -ne 'unless(/^-{10}/){print;}' \ | perl -pe 's/^\s+said\s+the\s+following:/:/' | ... which might be further pessimized. $ time { for i in `seq 1 100`; do cat t.file | \ sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' > /dev/null; done; } real 0m1.267s user 0m0.450s sys 0m0.650s $ time { for i in `seq 1 100`; do cat t.file | \ perl -ne 'unless(/^-{10}/){print;}' | \ perl -pe 's/^\s+said\s+the\s+following:/:/' >/dev/null; done; } real 0m5.630s user 0m3.250s sys 0m2.230s Yay! Ok, seriously: starting a process (in "tight loops") is SLOW: $ time { for i in `seq 1 1000`; do /[..]/true; done; } real 0m4.510s user 0m1.520s sys 0m2.880s (this is 'true (GNU coreutils) 6.9') # ls -l /[..]/true [..] 35202 Dec 18 03:50 /[..]/true $ time { for i in `seq 1 1000`; do /bin/true; done; } real 0m1.825s user 0m0.300s sys 0m1.450s This is the smallest possible /bin/true I know of: [..] 45 Mar 5 2001 /bin/true Yes, only 45 Bytes of handcrafted assembler/machine-code (not by me). $ time { for i in `seq 1 1000`; do true; done; } real 0m0.063s user 0m0.060s sys 0m0.000s This is the shell-builtin. Note the 'sys' times. The builtin is 71 times faster than coreutils true, and still 28 times faster than the probably smallest ELF binary there is[2], i.e. that comparison shows the minimal difference of "process" vs. "builtin". So, it is a worthy goal to try to cut down on starting processes (esp. inside 'while read; do ..; done' loops. Starting a perl for each line of input wreaks havoc on efficiency). And usually it only takes a little thought to do so. A couple of guidelines might help recognizing when to think: * don't use grep after grep other than 'grep ... | grep -v ...' (or 'grep -v .. | grep ..'), think about using sed, awk, perl, ... * if you use more than one of grep, sed, awk or perl in one pipeline-construct (even via 'var=$(..); echo "$var" |', think about replacing as much of pipelined stuff as possible by using only one tool more fit for the task (see next point), even if it's (slow-starting) perl, in the end, perl soon starts faster than complicated pipelines, and you can do all you do in the script in perl. * don't use grep with sed, awk, perl, sed with awk, perl in one pipeline. You can do everything that grep does with sed, everything that sed does with awk, ... grep < sed < awk < perl[1] though it might be "harder" or less obvious using the "next" tool. I don't know where other scripting languages like python fit. Python probably along perl, others a little "left" of it. Replace "perl" in the guidelines with e.g. "python" if applicable. A further example, demonstrating my point is above command. grep -v -e '---.*' -e '^said the following' is equivalent to above sed -e '/---.*/d' -e '/^said the following/d' and sed "can do much more" than that. And it's equivalent to awk '/---.*|^said the following/ { next; } { print; }' and awk "can do much more" than that. And it's equivalent to (e.g.) perl -ne 'unless(/---.*|^said the following/) { print; }' and perl "can do much more" than that. It follows that something like e.g. 'grep ... | sed' can be done more efficiently by basically using 'sed '/GREP STUFF/ { SED STUFF }'. HTH, -dnh [1] though you have to "reprogram" some awk-features in perl. [2] see http://www.muppetlabs.com/~breadbox/software/tiny/ from the README of tiny.tar.gz about "true": This one is the runt of the litter: 45 bytes in size. I believe that this is the smallest it is possible for a Linux ELF executable to be. (After all, a complete ELF header alone is 52 bytes long.) -- Ceci n'est pas une .signature. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
David Haller wrote:
It follows that something like e.g. 'grep ... | sed' can be done more efficiently by basically using 'sed '/GREP STUFF/ { SED STUFF }'.
HTH, -dnh
Much more than help, That was excellent. It has found it way to the top of the 'programming' basket in basket notepad as a continual reminder of program efficiency. Thank you for the 'perls' of wisdom -- no pun intended ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "David Haller"
Hello,
On Thu, 05 Feb 2009, David C. Rankin wrote:
Rikard Johnels wrote:
This one works. All i have to do is remove the ----'s and the "said the following" lines. But thats fairly easy...
Yep,
sed -e '/---.*/d' -e '/^said\sthe\sfollowing/d' < sedtest > outfile
I can't believe it.
a) you don't have to output the '---...' lines from awk b) you can remove the "said..." stuff in awk
==== #!/usr/bin/gawk -f BEGIN { RS="\n*----------\n+"; # set the Record Seperator ORS="\n"; # set Output Record Seperator } { # replace the first "said..." of the block with the ":" sub(" said the following:", ":", $0); # store the block in the array of blocks t[NR]=$0; } END { # output blocks in reversed order for(i=NR; i>=0; i--) { print ORS t[i]; } } ====
Nice. In my (weak) defense, I was making no assumptions about the size of the book or the size of the users ram. I almost never load whole files into ram like this just for that reason. Same way I use find|xargs instead of using shell filename globbing any time I don't know how large the list will expand. I hate banking on "probably it will always be small enough". Above will be blazingly insanely fast though and of course a 300 page book will surely fit in anyones ram. -- Brian K. White brian@aljex.com http://profile.to/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Thu, 05 Feb 2009, Brian K. White wrote:
From: "David Haller"
That single line suffices. Please delete the rest, as I just did for you. [..]
In my (weak) defense, I was making no assumptions about the size of the book or the size of the users ram.
Well, I did (see below).
I almost never load whole files into ram like this just for that reason.
I try to avoid it.
Same way I use find|xargs instead of using shell filename globbing any time I don't know how large the list will expand. I hate banking on "probably it will always be small enough".
Above will be blazingly insanely fast though and of course a 300 page book will surely fit in anyones ram.
Exactly. You can fit a LOT of plain-text into few MB of RAM ;) And circumventing it (e.g. by reversing the lines of each block, write them out to a file and use 'tac' on the file or something like that) is just not worth the effort in this case. E.g. the script for "The Life of Brian" is just 87 KB. Let's have a look what RAM is needed for some bits of plaintext... $ echo "" | awk '{ t[NR]=$0; } END { print "read " NR " lines"; while ( getline <"/proc/self/status" ) { if( $0 ~ /VmSize|Name/ ) { print; }; }; }' read 1 lines Name: awk VmSize: 1468 kB $ awk 'same as above' <(zcat Life_of_Brian.txt.gz) read 2253 lines Name: awk VmSize: 1852 kB $ awk 'same as above' /var/lib/locatedb read 27493 lines Name: awk VmSize: 62144 kB $ ls -lh /var/lib/locatedb -rw-r--r-- 1 root root 56M Jan 15 19:52 /var/lib/locatedb Won't get much bigger than that I'd guess, as far as plain-text is concerned ;) And for comparison: $ perl -ne 'push(@t,$_); END { print "read ", scalar @t, " lines\n"; open(S,"<","/proc/self/status") or die $!; while(<S>) { print if /VmSize|Name/;} close(S); }' /var/lib/locatedb read 27493 lines Name: perl VmSize: 89680 kB (perl is quite a bit faster than awk) -dnh -- Life sucks, and then you die. And then it still sucks. -- Georgia 'George' Lass, Dead Like Me -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (9)
-
Anders Johansson
-
Brian K. White
-
David C. Rankin
-
David Haller
-
G T Smith
-
nordi
-
Rajko M.
-
Randall R Schulz
-
Rikard Johnels