awk
Misc Tricks
-
add thousand separator for numbers (watch the quotes!):
printf("%'"'"'\d",$i)
-
Remove ending columns :
NF=NF-x
-
dynamic the printf format, using E.g.
%*s
likeprintf "%.*f",2,100/3
printf to output thousand separator
kent$ echo "1234567890"|awk '{printf "%'\''d\n",$0}' 1,234,567,890
character <=> ascii codes
echo a | awk 'BEGIN{for(n=0;n<256;n++)ord[sprintf("%c",n)]=n}{print ord[$1]}' 97 echo 97 | awk 'BEGIN{for(n=0;n<256;n++)chr[n]=sprintf("%c",n)}{print chr[$1]}' a
getline with pipe
the cmd should be closed after getline
like cmd1 | getline result; close(cmd1)
source:http://www.gnu.org/software/gawk/manual/html_node/Getline_002fPipe.html#Getline_002fPipe
To see problem, make a file with 3 same lines: 1
, then:
kent$ awk '{m="";cmd=sprintf("echo -n \"%s\"|md5sum",$1);(cmd | getline m) ;print m;}' f c4ca4238a0b923820dcc509a6f75849b - <3 empty lines> kent$ awk '{m="";cmd=sprintf("echo -n \"%s\"|md5sum",$1);(cmd | getline m) close(cmd);print m;}' f c4ca4238a0b923820dcc509a6f75849b - c4ca4238a0b923820dcc509a6f75849b - c4ca4238a0b923820dcc509a6f75849b -
GNU Sed
"e" flag s/../../ge
Example One
a yao.com sina.com b kongu.com c polm.com unee.net 21cn.com iop.com d kinge.net 如上的文本内容,我想最终结果如下 a yao.com a sina.com b kongu.com c polm.com c unee.net c 21cn.com c iop.com d kinge.net
假设保存原始文件为sample.txt
kent$ sed -r 's:(^[a-z]+) (.*):echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d":ge' sample.txt
即可. 解释:
s/pattern/ shell command pipe command/e
可以通过s的pattern部分获得要处理的信息,特别是通过分组的正则表达式。然后用已经匹配的信息作为参数,利用shell或者外部工具再处理,生成要替换的字符串完成替换。
要注意的地方:
-要用的外部命令或工具中不能含有最外面sed s的分隔符号,所以,最好sed用一个比较特殊的,比如:, # 或者其它。 -如果有管道,要转义 \|
具体命令解释:
sed -r 's:(^[a-z]+) (.*):echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d":ge' sample.txt
- 最外面是一个 sed -r 's:(^[a-z]+) (.*) : :ge' 的结构,冒号是分隔符。pattern是两个组。 (开头的字母)一个空格(剩下所有)。 其中(\2)开头是一个空格。 因为key和value之间是2个空格
- 然后把剩下所有(\2)通过echo输出给另一个sed, sed "s# # \\n\1 #g" 把所有空格替换成 回车+\1+空格 分隔符是 #
- 因为原始文件每行后有回车,所以以上替换后有空行,最后把第2步的结果通过管道再传给一个sed来删除空行
- 最后完成第一步的替换
Example Two
http://stackoverflow.com/questions/7812497/add-filename-to-beginning-of-file-using-find-and-sed
using the following I add the file name to the front of each line and send the output to a single file.
ls | while read file; do sed -e "s/^/\(file/g" \)file > out; done
I want to perform the same sed replacement but using a find and exec or xargs command -
find . -type f -exec sed "s/^/{}/g" {} > out +
but I get an error -
find: Only one instance of {} is supported with -exec ... +
Input files are like this -
fileA.txt A1 A2 fileB.txt B1 B2 desired output fileA.txt A1 fileA.txt A2 fileB.txt B1 fileB.txt B2
I know how to do this with awk, but I'd like to do it with sed, find and exec or xargs.
Solution
find . -type f | xargs -i echo {}|sed -r 's#(.\/)(.*)#cat &\|sed "s:^:file \2 :g"ge'
test
kent$ head *.txt ==> a.txt <== A1 A2 ==> b.txt <== B1 B2 kent$ find . -type f | xargs -i echo {}|sed -r 's#(.\/)(.*)#cat &\|sed "s:^:file \2 :g"#ge' file b.txt B1 file b.txt B2 file a.txt A1 file a.txt A2
Short explanation
-
find ....|xargs -i echo {}
nothing to explain, just print the filename per line (with leading "./")
-
then pass the filename to a sed line like
sed -r 's#(.\/)(.*)# MAGIC #ge'
-
remember that in the above line, we have two groups
\1: "./" and \2 "a.txt"(filename)
- since we have e at the end of sed line, the MAGIC part would be executed as shell command.(GNU sed needed)
-
MAGIC:
cat &\|sed "s:^:file \2 :g cat &
is just output the file content, and pipe to another sed. do the replace(s:..:..:g)
finally, the execution result of MAGIC would be the Replacement of the outer sed.
Example Three
I want to replace the timestamps (timestamp + duration) in this file
1357222500 3600 ... Maybe intermediate strings... 1357226100 3600 ... Maybe intermediate strings... ... to Do 3. Jan 15:15:00 CET 2013 - Do 3. Jan 16:15:00 CET 2013 Maybe intermediate strings... Do 3. Jan 16:15:00 CET 2013 - Do 3. Jan 17:15:00 CET 2013 Maybe intermediate strings... ...
Solution:
sed -r 's#^([0-9]{1,10}) ([0-9]{1,4})(.*\()#echo \)(date --date=@\1 )" - "\((date --date=@\)((\1+\2)))#ge' file