Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know you’re not asking for awk protips but you can prefix the block with a match condition for processing.

... | grep foo | awk ‘{print $6}’ | ...

becomes

... | awk ‘/foo/{print $6}’ | ...

If you start working this into your awk habits you’ll find delightful little edge cases that you can handle with other expressions before the block (you can, for example, match specific fields).



No one has mentioned changing the default field separator, e.g.,

  awk FS=:   '{print $1}' instead of cut -d: -f1

  awk FS="<" '{print $2}' instead of cut -d'<' -f2

  awk FS=">" '{print $1}' instead of cut -d'>' -f1


No need to explicitly set FS! Just use:

echo test,123 | awk -F, '{print $1}'


Yikes. The syntax I had was wrong anyway. Should have been

   awk 'BEGIN {FS=":"};{print $1}'
One benefit of the FS variable over -F, at least in original awk, is that by using FS the delimiter can be more than one character. I guess that's why I remember FS before I remember -F. More flexible.


-F does allow multicharacter separators (at least true for me on bash shell and gawk)

    $ echo 'Sample123string42with777numbers' | awk -F'[0-9]+' '{print $2}'
    string


you were close! the following works as well

  awk -v FS="\t"


If I am not mistaken, -v is GAWK only.


Every contemporary AWK supports -v. Real AWK from UNIX®️ supported -v since at least the '80's.


True. But there are differences when -v is used, as opposed to FS. Try this, where "nawk" is Lucent awk used by BSD

     cat > 1.awk << eof

     { print $ARGC }

     eof

     echo|nawk -f 1.awk FS=":"
     
     echo|gawk -f 1.awk FS=":"     
     
     echo|nawk -f 1.awk -v FS=":"

     echo|gawk -f 1.awk -v FS=":"


That is not how FS is set; It's set with -F. And there is actually no need to use -v, passing variables at the end works consistently across all AWK's and always has:

  echo "" | awk '{print Bla;}' Bla="Hello."


What if you set FS with -F but then later in the script want to change FS to something else.


The results will be unpredictable at best; either set it with -F, or use 'BEGIN {FS = "...";}', but not both.


So is -F, IIRC.


-F has always been supported by real UNIX®️ AWK; that's where -v and -F come from.


BUGS The -F option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility.

EXAMPLES Print and sort the login names of all users:

            BEGIN     { FS = ":" }
                 { print $1 | "sort" }
The above is from the GAWK manpage. FWIW, the first example under EXAMPLES uses FS not -F.

There is nothing wrong with using FS instead of -F.


GAWK is not a real AWK!!! When will you people learn that GNU is not UNIX®️?

FS is not used on the command line and doing so is asking for trouble. FS is a built-in variable and as such is treated specially.


To pile on :-) you often want -w (match word) flag to grep.

In awk, I couldn't find how to do this. I tried /\bfoo\b/ and /\<foo\>/ but neither worked. I don't know why and don't care enough which brings me to my major awk irritation ...

It doesn't use extended or perl REs, which makes it quite different to ruby, perl, python, java. Now, according to the man page it does; at least on OSX (man re_format) but as mentioned it didn't work for me.

Details

   $ echo fish | awk  '/\bfish\b/' 
gets nothing, vs

   $ echo fish | perl -ne  '/\bfish\b/ && print' 
fish


UGH! Found the problem; it simply doesn't work. Assuming the OSX awk is the same as the freebsd awk there is a very old open bug on this:

awk(1) does not support word-boundary metacharacters https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171725


GNU awk supports \< and \> for start and end of word anchors, which works for GNU grep/sed as well

GNU awk also supports \y which is same as \b as well as \B for opposite (same as GNU grep/sed)

Intererstingly, there's a difference between the three types of word anchors:

    $ # \b matches both start and end of word boundaries
    $ # 1st and 3rd line have space as second character
    $ echo 'I have 12, he has 2!' | grep -o '\b..\b'
    I 
    12
    , 
    he
     2

    $ # \< and \> strictly match only start and end word boundaries respectively
    $ echo 'I have 12, he has 2!' | grep -o '\<..\>'
    12
    he

    $ # -w ensures there are no word characters around the matching text
    $ # same as: grep -oP '(?<!\w)..(?!\w)'
    $ echo 'I have 12, he has 2!' | grep -ow '..'
    12
    he
    2!


Sure, but a fair bit of the value of the tool is it's consistency across platforms.

There's no point in awk if perl etc are ubiquitous and more consistent.


\< and \> work with GNU's awk:

  $ printf "fishstick\nfish\ngoldfish\n" | awk '/\<fish\>/' 
  fish


\b is Perl RE, not ERE. AWK not only supports ERE's, but POSIX RE's as well.


On the other hand, grep can be far faster for searching alone than awk. I almost always use an initial grep for the string that will most reduce the input to the rest of the pipeline. Later, it feels idiomatic to mix in awk with matches like you suggested


Depends on the awk. mawk is surprisingly fast.


Right. I don't consider that particular exhaustive at all and this has helped me when I wanted to do quick searches.


I always forget about that, and I should try more to remember it. Thank you for the tip!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: