[opensuse] grep sed and awk question

newer
[opensuse] script to watch a movie

lynn

12 Feb 2012 12 Feb '12

23:05

I'm regex'd out of it at the moment. Given a string like this: lynn:*:3000002some other stuff100hellolynn: Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Thanks, L x -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Show replies by date

lynn

12 Feb 12 Feb

23:39

On 02/13/2012 12:05 AM, lynn wrote:

...

I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Thanks, L x

Thinking out loud: v1="lynn:*:3000002some other stuff100hellolynn:";echo "${v1//[!0-9]}" gives: 3000002100 But I may have: lynn2:*:3000002some other stuff100hellolynn: which gives: 23000002100 How to select only the 3000002 Thanks -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Marko Koski-Vähälä

13 Feb 13 Feb

00:23

* lynn <lynn@steve-ss.com> [120213 00:40]:

...

On 02/13/2012 12:05 AM, lynn wrote:

...
I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Thanks, L x

Thinking out loud: v1="lynn:*:3000002some other stuff100hellolynn:";echo "${v1//[!0-9]}" gives: 3000002100

But I may have: lynn2:*:3000002some other stuff100hellolynn: which gives: 23000002100

How to select only the 3000002 Thanks --

Does it have to be grep, sed and awk? cut -c 9-15 awk -F\: '{print $3}' |cut -c 1-7

zep

00:33

On 2/12/2012 6:39 PM, lynn wrote:

...

On 02/13/2012 12:05 AM, lynn wrote:

...
I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Thanks, L x

Thinking out loud: v1="lynn:*:3000002some other stuff100hellolynn:";echo "${v1//[!0-9]}" gives: 3000002100

But I may have: lynn2:*:3000002some other stuff100hellolynn: which gives: 23000002100

How to select only the 3000002 Thanks

it's not the most efficient, but I think something like: string="lynn:*:3000002some other stuff100hellolynn:" v1=`echo $string | sed -e "s/^.*:\*://" -e "s/[a-z ].*//"` v2=`echo $string | sed -e "s/^.*:\*:[a-z0-9]* [a-z ]*//" \ -e "s/[a-z]*://"` echo v1 $v1 echo v2 $v2 might be somewhat usable. assuming high performance isn't the rule of the day. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

07:27

lynn wrote:

...

I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point?

Assuming you've got a file with such lines, this might get you started: sed -r -e 's/^[^0-9]*([0-9]+)[^0-9]+([0-9]+).*$/\1 \2/' file -- Per Jessen, Zürich (-9.9°C) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

lynn

10:03

On 02/13/2012 08:27 AM, Per Jessen wrote:

...

lynn wrote:

...
I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Assuming you've got a file with such lines, this might get you started:

sed -r -e 's/^[^0-9]*([0-9]+)[^0-9]+([0-9]+).*$/\1 \2/' file

Hi Thanks for the input everyone. It's helped me get started. I should have been more specific. I've narrowed down the task to getting just the first number in a string _but_ the output comes from the wbinfo command e.g. wbinfo -i lynn CACTUS\lynn:*:3000002:100::/home/CACTUS/lynn2:/bin/bash I want to extract the 3000002 I've narrowed it down to this: #!/bin/bash str=$(wbinfo -i $1) echo $str | sed -r 's/^([^.]+).*$/\1/; s/^[^0-9]*([0-9]+).*$/\1/' which gives 3000004. Good. But if the user is called lynn2, it gives 2 So the problem comes down to: how to get the first number in the string _after_ the *: sequence (This would work for wbinfo --group-info too as it is the same format) Thanks, L x -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Tim Hempstead

10:40

On Mon, Feb 13, 2012 at 10:03 AM, lynn <lynn@steve-ss.com> wrote:

...

Hi Thanks for the input everyone. It's helped me get started. I should have been more specific. I've narrowed down the task to getting just the first number in a string _but_ the output comes from the wbinfo command e.g.

wbinfo -i lynn CACTUS\lynn:*:3000002:100::/home/CACTUS/lynn2:/bin/bash

I want to extract the 3000002

I've narrowed it down to this:

#!/bin/bash str=$(wbinfo -i $1) echo $str | sed -r 's/^([^.]+).*$/\1/; s/^[^0-9]*([0-9]+).*$/\1/'

which gives 3000004. Good.

But if the user is called lynn2, it gives 2

So the problem comes down to: how to get the first number in the string _after_ the *: sequence

(This would work for wbinfo --group-info too as it is the same format) Thanks, L x

It would appear that the field in the wbinfo output you want to extract has a colon before and after it? Hence can't you use ... wbinfo -i lynn | awk -F":" '{print $3}' Regards Tim -- Tim Hempstead thempstead@gmail.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

lynn

11:55

New subject: [opensuse] grep sed and awk question[solved]

On 02/13/2012 11:40 AM, Tim Hempstead wrote:

...

On Mon, Feb 13, 2012 at 10:03 AM, lynn<lynn@steve-ss.com> wrote:

...
Hi Thanks for the input everyone. It's helped me get started. I should have been more specific. I've narrowed down the task to getting just the first number in a string _but_ the output comes from the wbinfo command e.g.

wbinfo -i lynn CACTUS\lynn:*:3000002:100::/home/CACTUS/lynn2:/bin/bash

I want to extract the 3000002

I've narrowed it down to this:

#!/bin/bash str=$(wbinfo -i $1) echo $str | sed -r 's/^([^.]+).*$/\1/; s/^[^0-9]*([0-9]+).*$/\1/'

which gives 3000004. Good.

But if the user is called lynn2, it gives 2

So the problem comes down to: how to get the first number in the string _after_ the *: sequence

(This would work for wbinfo --group-info too as it is the same format) Thanks, L x

It would appear that the field in the wbinfo output you want to extract has a colon before and after it? Hence can't you use ...

wbinfo -i lynn | awk -F":" '{print $3}'

Regards

Tim Hi Tim, hi everyone Yes. The key was that lifesaving colon. In the end I used cut: strgid=$(wbinfo --group-info=$1) gid=$(echo $strgid | cut -d ":" -f 3)

Thanks to everyone who helped. L x -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Brian K. White

14 Feb 14 Feb

02:59

On 2/12/2012 6:05 PM, lynn wrote:

...

I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Thanks, L x

You haven't defined the problem clearly enough for a correct answer. Is there really no other field delimiters (":" in this case) between the * and the end of the line? Is "some other stuff" always the same length? _always_? Can "some other stuff" contain numbers? Basically I have to doubt that the file you are reading really looks like this, or if it does, what is generating it? The only way such a file would be useful is if the fields were all fixed length, because there is no delimiters for most of the line, but then it's odd that there are any delimiters at all in that case. Please describe more about what is creating this file, and provide some more sample records, and don't feel free to modify the line to hide sensitive data, rather create some junk records in the generating application that aren't sensitive in the first place, then supply those without changing them at all. There are a few different ways to do what you asked, but no way to say if they will work on any other input except that specific line above. That isn't very useful. Best guess until you say otherwise is that the fields tat are delimited by :'s are variable length, but the 3rd field DONE=false until $DONE ;do IFS=: read F1 F2 F3 junk|| DONE=true case "$F1" in ""|\#*) continue ;; esac F3_1=${F3:0:7} F3_2=${F3:7:16} F3_3=${F3:23:3} echo -e "Name:\t\"${F1}\"" echo -e " Number A:\t\"${F3_1}\"" echo -e " Description:\t\"${F3_2}\"" echo -e " Number B:\t\"${F3_3}\"" done < file.txt Given this line of input: lynn:*:3000002some other stuff100hellolynn: the "IFS=: read F1 F2 F3 junk" command will read the line and treat ":" as the word separator, so, F1 will be "lynn", F2 will be "*", F3 will everything up to the 3rd ":", and junk will be anything after the 3rd colon. Then inside the "until loop", it counts bytes to extract chunks of $F3 into $F3_1 $F3_2 etc.. ${F3:0:7} means to output 7 bytes starting at byte 0 of $F3 ${F3:7:16} means output 16 bytes starting at byte 7 (counting from 0) of $F3 ${F3:23:3} means output 3 bytes starting at byte 24 (counting from 0) of $F3 Then I stuck in stuff to ignore empty lines and lines that begin with #, and the use of the $DONE variable ensures that the last line is processed even if the file ends right at the end of the line with no trailing newline. So given this sample input: ---- lynn:*:3000002some other stuff100hellolynn: foob:*:6100013some stuff......201hellofoob: snac:*:3000562stuff goes here.111hellosnac: higb:*:5000001gibberish.......007hellohigb: # a comment blah:*:7500022blah blah blah..911helloblah: ---- You get: Name: "lynn" Number A: "3000002" Description: "some other stuff" Number B: "100" Name: "foob" Number A: "6100013" Description: "some stuff......" Number B: "201" Name: "snac" Number A: "3000562" Description: "stuff goes here." Number B: "111" Name: "higb" Number A: "5000001" Description: "gibberish......." Number B: "007" Name: "blah" Number A: "7500022" Description: "blah blah blah.." Number B: "911" Which is nice and all, but only works if the lengths of the number and comment fields are always exactly the same length on every record. -- bkw -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

lynn

09:00

On 02/14/2012 03:59 AM, Brian K. White wrote:

...

On 2/12/2012 6:05 PM, lynn wrote:

...
I'm regex'd out of it at the moment. Given a string like this:

lynn:*:3000002some other stuff100hellolynn:

Is there a way to get the 3000002 into a variable v1 and the 100 into a variable v2? bash? Any recommended starting point? Thanks, L x

You haven't defined the problem clearly enough for a correct answer.

Is there really no other field delimiters (":" in this case) between the * and the end of the line?

Is "some other stuff" always the same length? _always_?

Can "some other stuff" contain numbers?

Basically I have to doubt that the file you are reading really looks like this, or if it does, what is generating it? The only way such a file would be useful is if the fields were all fixed length, because there is no delimiters for most of the line, but then it's odd that there are any delimiters at all in that case.

Please describe more about what is creating this file, and provide some more sample records, and don't feel free to modify the line to hide sensitive data, rather create some junk records in the generating application that aren't sensitive in the first place, then supply those without changing them at all.

There are a few different ways to do what you asked, but no way to say if they will work on any other input except that specific line above. That isn't very useful.

Best guess until you say otherwise is that the fields tat are delimited by :'s are variable length, but the 3rd field

DONE=false until $DONE ;do IFS=: read F1 F2 F3 junk|| DONE=true case "$F1" in ""|\#*) continue ;; esac F3_1=${F3:0:7} F3_2=${F3:7:16} F3_3=${F3:23:3} echo -e "Name:\t\"${F1}\"" echo -e " Number A:\t\"${F3_1}\"" echo -e " Description:\t\"${F3_2}\"" echo -e " Number B:\t\"${F3_3}\"" done < file.txt

Given this line of input: lynn:*:3000002some other stuff100hellolynn:

the "IFS=: read F1 F2 F3 junk" command will read the line and treat ":" as the word separator, so, F1 will be "lynn", F2 will be "*", F3 will everything up to the 3rd ":", and junk will be anything after the 3rd colon.

Then inside the "until loop", it counts bytes to extract chunks of $F3 into $F3_1 $F3_2 etc.. ${F3:0:7} means to output 7 bytes starting at byte 0 of $F3 ${F3:7:16} means output 16 bytes starting at byte 7 (counting from 0) of $F3 ${F3:23:3} means output 3 bytes starting at byte 24 (counting from 0) of $F3

Then I stuck in stuff to ignore empty lines and lines that begin with #, and the use of the $DONE variable ensures that the last line is processed even if the file ends right at the end of the line with no trailing newline.

So given this sample input: ---- lynn:*:3000002some other stuff100hellolynn: foob:*:6100013some stuff......201hellofoob: snac:*:3000562stuff goes here.111hellosnac: higb:*:5000001gibberish.......007hellohigb:

# a comment blah:*:7500022blah blah blah..911helloblah: ----

You get:

Name: "lynn" Number A: "3000002" Description: "some other stuff" Number B: "100" Name: "foob" Number A: "6100013" Description: "some stuff......" Number B: "201" Name: "snac" Number A: "3000562" Description: "stuff goes here." Number B: "111" Name: "higb" Number A: "5000001" Description: "gibberish......." Number B: "007" Name: "blah" Number A: "7500022" Description: "blah blah blah.." Number B: "911"

Which is nice and all, but only works if the lengths of the number and comment fields are always exactly the same length on every record.

Hi Brian Thank you so much for your effort. This thread has given me insight into the hidden power of Linux. You are right. I had not provided enough info. The strings are from the output of wbinfo -i user and wbinfo --group-info=group and so yes, there are : delimiters and they come at the same place for both user and group. This must have been designed with people like me in mind! cut did it, e.g. for groups: strgid=$(wbinfo --group-info=$1) gid=$(echo $strgid | cut -d ":" -f 3) echo $gid I had marked the thread as [solved] previously and I hope that you don't feel you've wasted your time with your (excellent) explanation here. On the contrary. I have learned a load of new stuff from it. More, I have learned how to ask smarter questions. L x -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

lynn

15 Feb 15 Feb

00:00

New subject: the cut command was:[opensuse] grep sed and awk question

Hi Just a bit confused about what the cut command is actually saying. e.g. strgid="suseusers:*:3000028:" gid=$(echo $strgid | cut -d ":" -f 3) $gid=3000028 I count 3 : delimiters I think I understand. But then: strsid="S-1-5-21-2395500911-3560017633-4088823418-1134" pgrp=$(echo $strsid | cut -d "-" -f 8) $pgrp=1134 I count 7 - delimiters I don't understand any more. What's the -f n saying? 'I go to the n-1'th delimiter and give you whatever is between that and the n'th delimiter'? Can I assume that the n'th delimiter could either be a delimiter or an end of line character? If so, what is the end of line delimiter? Ahhggh! Thanks, L x -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anders Johansson

00:03

New subject: the cut command was:[opensuse] grep sed and awk question

On Wednesday 15 February 2012 01:00:38 lynn wrote:

...

strsid="S-1-5-21-2395500911-3560017633-4088823418-1134" pgrp=$(echo $strsid | cut -d "-" -f 8)

$pgrp=1134 I count 7 - delimiters I don't understand any more.

What's the -f n saying? 'I go to the n-1'th delimiter and give you whatever is between that and the n'th delimiter'?

Can I assume that the n'th delimiter could either be a delimiter or an end of line character? If so, what is the end of line delimiter?

-f means "field". S is the first field, 1 the second, 21 the third, and so on. 1134 is the 8th field. Anders -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anders Johansson

00:09

New subject: the cut command was:[opensuse] grep sed and awk question

On Wednesday 15 February 2012 01:03:56 Anders Johansson wrote:

...

On Wednesday 15 February 2012 01:00:38 lynn wrote:

...
strsid="S-1-5-21-2395500911-3560017633-4088823418-1134" pgrp=$(echo $strsid | cut -d "-" -f 8)

$pgrp=1134 I count 7 - delimiters I don't understand any more.

What's the -f n saying? 'I go to the n-1'th delimiter and give you whatever is between that and the n'th delimiter'?

Can I assume that the n'th delimiter could either be a delimiter or an end of line character? If so, what is the end of line delimiter?

-f means "field". S is the first field, 1 the second, 21 the third, and so

Sorry, 5 is the third field, 21 the fourth. But you get the point Anders -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

lynn

00:26

New subject: the cut command was:[opensuse] grep sed and awk question[solved]

On 15/02/12 01:09, Anders Johansson wrote:

...

On Wednesday 15 February 2012 01:03:56 Anders Johansson wrote:

...
On Wednesday 15 February 2012 01:00:38 lynn wrote:

...
strsid="S-1-5-21-2395500911-3560017633-4088823418-1134" pgrp=$(echo $strsid | cut -d "-" -f 8)

$pgrp=1134 I count 7 - delimiters I don't understand any more.

What's the -f n saying? 'I go to the n-1'th delimiter and give you whatever is between that and the n'th delimiter'?

Can I assume that the n'th delimiter could either be a delimiter or an end of line character? If so, what is the end of line delimiter?

-f means "field". S is the first field, 1 the second, 21 the third, and so

Sorry, 5 is the third field, 21 the fourth. But you get the point

Anders

Yes I get it now. Sorry folks. Thanks, Lynn -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

4689

Age (days ago)

4692

Last active (days ago)

List overview

Download

13 comments

7 participants

participants (7)

Anders Johansson
Brian K. White
lynn
Marko Koski-Vähälä
Per Jessen
Tim Hempstead
zep