awk

Misc Tricks

printf to output thousand separator

kent$  echo "1234567890"|awk '{printf "%'\''d\n",$0}'
1,234,567,890

character <=> ascii codes

echo a | awk 'BEGIN{for(n=0;n<256;n++)ord[sprintf("%c",n)]=n}{print ord[$1]}'
97

echo 97 | awk 'BEGIN{for(n=0;n<256;n++)chr[n]=sprintf("%c",n)}{print chr[$1]}'
a

getline with pipe

the cmd should be closed after getline like cmd1 | getline result; close(cmd1) source:http://www.gnu.org/software/gawk/manual/html_node/Getline_002fPipe.html#Getline_002fPipe

To see problem, make a file with 3 same lines: 1, then:

kent$  awk  '{m="";cmd=sprintf("echo -n \"%s\"|md5sum",$1);(cmd | getline m) ;print m;}' f 
c4ca4238a0b923820dcc509a6f75849b  -
<3 empty lines>

kent$  awk  '{m="";cmd=sprintf("echo -n \"%s\"|md5sum",$1);(cmd | getline m) close(cmd);print m;}' f
c4ca4238a0b923820dcc509a6f75849b  -
c4ca4238a0b923820dcc509a6f75849b  -
c4ca4238a0b923820dcc509a6f75849b  -

GNU Sed

"e" flag s/../../ge

Example One

  
a  yao.com sina.com
b  kongu.com
c  polm.com unee.net 21cn.com iop.com
d  kinge.net

如上的文本内容,我想最终结果如下

a  yao.com 
a  sina.com
b  kongu.com
c  polm.com 
c  unee.net 
c  21cn.com 
c   iop.com
d  kinge.net

假设保存原始文件为sample.txt

kent$ sed -r 's:(^[a-z]+) (.*):echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d":ge' sample.txt  

即可. 解释:

s/pattern/ shell command pipe command/e

可以通过s的pattern部分获得要处理的信息,特别是通过分组的正则表达式。然后用已经匹配的信息作为参数,利用shell或者外部工具再处理,生成要替换的字符串完成替换。

要注意的地方:

-要用的外部命令或工具中不能含有最外面sed s的分隔符号,所以,最好sed用一个比较特殊的,比如:, # 或者其它。 -如果有管道,要转义 \|

具体命令解释: sed -r 's:(^[a-z]+) (.*):echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d":ge' sample.txt

Example Two

http://stackoverflow.com/questions/7812497/add-filename-to-beginning-of-file-using-find-and-sed

using the following I add the file name to the front of each line and send the output to a single file.

ls | while read file; do sed -e "s/^/\(file/g" \)file > out; done

I want to perform the same sed replacement but using a find and exec or xargs command -

find . -type f -exec sed "s/^/{}/g" {} > out +

but I get an error -

find: Only one instance of {} is supported with -exec ... +

Input files are like this -

fileA.txt

A1
A2
fileB.txt

B1
B2
desired output

fileA.txt A1
fileA.txt A2
fileB.txt B1
fileB.txt B2

I know how to do this with awk, but I'd like to do it with sed, find and exec or xargs.

Solution

find . -type f | xargs -i echo {}|sed -r 's#(.\/)(.*)#cat &\|sed  "s:^:file \2 :g"ge'

test

kent$  head *.txt
==> a.txt <==
A1
A2

==> b.txt <==
B1
B2

kent$  find . -type f | xargs -i echo {}|sed -r 's#(.\/)(.*)#cat &\|sed  "s:^:file \2 :g"#ge'
file b.txt B1
file b.txt B2
file a.txt A1
file a.txt A2

Short explanation

finally, the execution result of MAGIC would be the Replacement of the outer sed.

Example Three

http://stackoverflow.com/questions/14102504/replace-strings-with-evaluated-string-based-on-matched-group-elegant-way-not-u

I want to replace the timestamps (timestamp + duration) in this file

1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
...

to

Do 3. Jan 15:15:00 CET 2013 - Do 3. Jan 16:15:00 CET 2013
Maybe intermediate strings...
Do 3. Jan 16:15:00 CET 2013 - Do 3. Jan 17:15:00 CET 2013
Maybe intermediate strings...
...

Solution:

sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*\()#echo \)(date --date=@\1 )" - "\((date --date=@\)((\1+\2)))#ge' file

<<Back