Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

sed a substring out of a string

Status
Not open for further replies.

akaballa

Computer
May 29, 2011
7
0
0
CA
Hi I am not sure if I am in the right forum.

I have a quick concern with the sed command.

I am trying to extract a substring from a string in shell script.

This is my code so far:

Code:
cmdRun="-vs LAST_BUS_DAY=20111005"
time_stamp=`echo "$cmdRun" | sed 's/.*([0-9])+.*/\\\\1/'`
echo $time_stamp

I just need to extract 20111005. If this case the numbers are in the end. But in some other cases it might be in the middle of the string. Therefore, I chose to use sed.

However, when I run this code, It match the entire $cmdRun string. Please advice!

Regards,
akaballa
 
Replies continue below

Recommended for you

akaballa,

This is a good forum.

I used sed in a makefile a long time ago to replace footer text on a website. I forget how I did it. I read the man page at the time. I think sed has one of those "In A Nutshell" books about an inch thick.

I use perl to solve problems like this. The switch command is an extremely powerful search and replace tool.

Your operation above, is looking for a consecutive series of numbers, right?



Critter.gif
JHG
 
akaballa,

So, you are searching for sequences of eight numbers, and rejecting any sequence with more or less than eight of them?

Your command line above selects one character on the basis being from 0 to 9. I don't understand set very well, but I see nothing that forces an eight character sequence.

You have a programming problem, easily solved in C. Probably, it is easily solved in perl, but I would have to read my book on it.


Critter.gif
JHG
 
Hi,
with sed I think is a little more difficult because sed matches the complete line. Why not use awk?. I show you an example:


cmdRun="-vs LAST_BUS_DAY=20111005"
time_stamp=`echo "$cmdRun" | awk '{print (substr($0,match($0,"(19|20)[0-9][0-9](0[1-9]|1[0-2])(0[1-9]|[12][0-9]|3[01])"),8))}'`
echo $time_stamp

The regexp matches a valid date 19xx or 20xx folowed by a valid month 0x or 10 or 11 or 12 followed by a day 0x or 1x or 2x or 30 or 31.

The substr command extract the 8 digits beginning where the match command matches the date.

The output from my terminal was:

$ cmdRun="-vs LAST_BUS_DAY=20111005"

$ time_stamp=`echo "$cmdRun" | awk '{print (substr($0,match($0,"(19|20)[0-9][0-9](0[1-9]|1[0-2])(0[1-9]|[12][0-9]|3[01])"),8))}'`

$ echo $time_stamp
20111005

The time_stamp line is all a single line.

Hope this helps.
Frank.
 
FrankMalone,

I am actually not sure about sed, but I know there is an "in a Netshell" book on awk, and it is very thick indeed. If the OP has any programming skills, they should know C. Perl is worth learning. I cannot see myself trying to figure out awk.

Critter.gif
JHG
 
Hi JHG, I understand you, but if you need work with filters in an easy way, awk is a very good choice.

The man page about awk is only two or three pages long, there is a port to windows also.

Buy in any case is up to you, if you don't like it don't use it.

Regards
Frank.
 
awk is not as hard to learn as sed.
A perquisite for either is a grasp of regex (which I am just a novice in)
sed is much more regex dependent than awk.
regex expressions for just about anything can be found using google.

For example this:
Code:
^([2-9]\d{3}((0[1-9]|1[012])(0[1-9]|1\d|2[0-8])|(0[13456789]|1[012])(29|30)|(0[13578]|1[02])31)|(([2-9]\d)(0[48]|[2468][048]|[13579][26])|(([2468][048]|[3579][26])00))0229)$
from regexlib matches 20000101 and checks for incorrect leap year dates.


TOP
CSWP, BSSE
Phenom IIx6 1100T = 8GB = FX1400 = XP64SP2 = SW2009SP3
"Node news is good news."
 
Status
Not open for further replies.
Back
Top