awk

Misc Tricks

add thousand separator for numbers (watch the quotes!): printf("%'"'"'\d",$i)
Remove ending columns : NF=NF-x
dynamic the printf format, using E.g. %*s like printf "%.*f",2,100/3

printf to output thousand separator

kent$  echo "1234567890"|awk '{printf "%'\''d\n",$0}'
1,234,567,890

character <=> ascii codes

echo a | awk 'BEGIN{for(n=0;n<256;n++)ord[sprintf("%c",n)]=n}{print ord[$1]}'
97

echo 97 | awk 'BEGIN{for(n=0;n<256;n++)chr[n]=sprintf("%c",n)}{print chr[$1]}'
a

getline with pipe

the cmd should be closed after getline like cmd1 | getline result; close(cmd1) source:http://www.gnu.org/software/gawk/manual/html_node/Getline_002fPipe.html#Getline_002fPipe

To see problem, make a file with 3 same lines: 1, then:

kent$  awk  '{m="";cmd=sprintf("echo -n \"%s\"|md5sum",$1);(cmd | getline m) ;print m;}' f 
c4ca4238a0b923820dcc509a6f75849b  -
<3 empty lines>

kent$  awk  '{m="";cmd=sprintf("echo -n \"%s\"|md5sum",$1);(cmd | getline m) close(cmd);print m;}' f
c4ca4238a0b923820dcc509a6f75849b  -
c4ca4238a0b923820dcc509a6f75849b  -
c4ca4238a0b923820dcc509a6f75849b  -

GNU Sed

"e" flag s/../../ge

Example One

  
a  yao.com sina.com
b  kongu.com
c  polm.com unee.net 21cn.com iop.com
d  kinge.net

如上的文本内容，我想最终结果如下

a  yao.com 
a  sina.com
b  kongu.com
c  polm.com 
c  unee.net 
c  21cn.com 
c   iop.com
d  kinge.net

假设保存原始文件为sample.txt

kent$ sed -r 's:(^[a-z]+) (.*):echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d":ge' sample.txt

即可. 解释：

s/pattern/ shell command pipe command/e

可以通过s的pattern部分获得要处理的信息，特别是通过分组的正则表达式。然后用已经匹配的信息作为参数，利用shell或者外部工具再处理，生成要替换的字符串完成替换。

要注意的地方：

-要用的外部命令或工具中不能含有最外面sed s的分隔符号，所以，最好sed用一个比较特殊的，比如:, # 或者其它。 -如果有管道，要转义 \|

具体命令解释： sed -r 's:(^[a-z]+) (.*):echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d":ge' sample.txt

最外面是一个 sed -r 's:(^[a-z]+) (.*) : :ge' 的结构，冒号是分隔符。pattern是两个组。（开头的字母）一个空格（剩下所有）。其中(\2)开头是一个空格。因为key和value之间是2个空格
然后把剩下所有(\2)通过echo输出给另一个sed, sed "s# # \\n\1 #g" 把所有空格替换成回车＋\1+空格分隔符是 #
因为原始文件每行后有回车，所以以上替换后有空行，最后把第2步的结果通过管道再传给一个sed来删除空行
最后完成第一步的替换

Example Two

http://stackoverflow.com/questions/7812497/add-filename-to-beginning-of-file-using-find-and-sed

using the following I add the file name to the front of each line and send the output to a single file.

ls | while read file; do sed -e "s/^/$file/g" $file > out; done

I want to perform the same sed replacement but using a find and exec or xargs command -

find . -type f -exec sed "s/^/{}/g" {} > out +

but I get an error -

find: Only one instance of {} is supported with -exec ... +

Input files are like this -

fileA.txt

A1
A2
fileB.txt

B1
B2
desired output

fileA.txt A1
fileA.txt A2
fileB.txt B1
fileB.txt B2

I know how to do this with awk, but I'd like to do it with sed, find and exec or xargs.

Solution

find . -type f | xargs -i echo {}|sed -r 's#(.\/)(.*)#cat &\|sed  "s:^:file \2 :g"ge'

test

kent$  head *.txt
==> a.txt <==
A1
A2

==> b.txt <==
B1
B2

kent$  find . -type f | xargs -i echo {}|sed -r 's#(.\/)(.*)#cat &\|sed  "s:^:file \2 :g"#ge'
file b.txt B1
file b.txt B2
file a.txt A1
file a.txt A2

Short explanation

find ....|xargs -i echo {} nothing to explain, just print the filename per line (with leading "./")

then pass the filename to a sed line like sed -r 's#(.\/)(.*)# MAGIC #ge'

remember that in the above line, we have two groups \1: "./" and \2 "a.txt"(filename)

since we have e at the end of sed line, the MAGIC part would be executed as shell command.(GNU sed needed)

MAGIC: cat &\|sed "s:^:file \2 :g cat & is just output the file content, and pipe to another sed. do the replace (s:..:..:g)

finally, the execution result of MAGIC would be the Replacement of the outer sed.

Example Three

http://stackoverflow.com/questions/14102504/replace-strings-with-evaluated-string-based-on-matched-group-elegant-way-not-u

I want to replace the timestamps (timestamp + duration) in this file

1357222500 3600 ...
Maybe intermediate strings...
1357226100 3600 ...
Maybe intermediate strings...
...

to

Do 3. Jan 15:15:00 CET 2013 - Do 3. Jan 16:15:00 CET 2013
Maybe intermediate strings...
Do 3. Jan 16:15:00 CET 2013 - Do 3. Jan 17:15:00 CET 2013
Maybe intermediate strings...
...

Solution:

sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*$)#echo $(date --date=@\1 )" - "$(date --date=@$((\1+\2)))#ge' file

<<Back