I know you’re not asking for awk protips but you can prefix the block with a mat...

3xblah · on Jan 21, 2020

No one has mentioned changing the default field separator, e.g.,

  awk FS=:   '{print $1}' instead of cut -d: -f1

  awk FS="<" '{print $2}' instead of cut -d'<' -f2

  awk FS=">" '{print $1}' instead of cut -d'>' -f1

chaps · on Jan 22, 2020

No need to explicitly set FS! Just use:

echo test,123 | awk -F, '{print $1}'

3xblah · on Jan 22, 2020

Yikes. The syntax I had was wrong anyway. Should have been

   awk 'BEGIN {FS=":"};{print $1}'

One benefit of the FS variable over -F, at least in original awk, is that by using FS the delimiter can be more than one character. I guess that's why I remember FS before I remember -F. More flexible.

asicsp · on Jan 22, 2020

-F does allow multicharacter separators (at least true for me on bash shell and gawk)

    $ echo 'Sample123string42with777numbers' | awk -F'[0-9]+' '{print $2}'
    string

cauthon · on Jan 22, 2020

you were close! the following works as well

  awk -v FS="\t"

3xblah · on Jan 22, 2020

If I am not mistaken, -v is GAWK only.

Annatar · on Jan 22, 2020

Every contemporary AWK supports -v. Real AWK from UNIX®️ supported -v since at least the '80's.

3xblah · on Jan 22, 2020

True. But there are differences when -v is used, as opposed to FS. Try this, where "nawk" is Lucent awk used by BSD

     cat > 1.awk << eof

     { print $ARGC }

     eof

     echo|nawk -f 1.awk FS=":"
     
     echo|gawk -f 1.awk FS=":"     
     
     echo|nawk -f 1.awk -v FS=":"

     echo|gawk -f 1.awk -v FS=":"

Annatar · on Jan 23, 2020

That is not how FS is set; It's set with -F. And there is actually no need to use -v, passing variables at the end works consistently across all AWK's and always has:

  echo "" | awk '{print Bla;}' Bla="Hello."

3xblah · on Jan 23, 2020

What if you set FS with -F but then later in the script want to change FS to something else.

Annatar · on Jan 28, 2020

The results will be unpredictable at best; either set it with -F, or use 'BEGIN {FS = "...";}', but not both.

jabl · on Jan 22, 2020

So is -F, IIRC.

Annatar · on Jan 22, 2020

-F has always been supported by real UNIX®️ AWK; that's where -v and -F come from.

3xblah · on Jan 24, 2020

BUGS The -F option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility.

EXAMPLES Print and sort the login names of all users:

            BEGIN     { FS = ":" }
                 { print $1 | "sort" }

The above is from the GAWK manpage. FWIW, the first example under EXAMPLES uses FS not -F.

There is nothing wrong with using FS instead of -F.

Annatar · on Jan 28, 2020

GAWK is not a real AWK!!! When will you people learn that GNU is not UNIX®️?

FS is not used on the command line and doing so is asking for trouble. FS is a built-in variable and as such is treated specially.

emmelaich · on Jan 21, 2020

To pile on :-) you often want -w (match word) flag to grep.

In awk, I couldn't find how to do this. I tried /\bfoo\b/ and /\<foo\>/ but neither worked. I don't know why and don't care enough which brings me to my major awk irritation ...

It doesn't use extended or perl REs, which makes it quite different to ruby, perl, python, java. Now, according to the man page it does; at least on OSX (man re_format) but as mentioned it didn't work for me.

Details

   $ echo fish | awk  '/\bfish\b/'

gets nothing, vs

   $ echo fish | perl -ne  '/\bfish\b/ && print'

fish

emmelaich · on Jan 21, 2020

UGH! Found the problem; it simply doesn't work. Assuming the OSX awk is the same as the freebsd awk there is a very old open bug on this:

awk(1) does not support word-boundary metacharacters https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171725

asicsp · on Jan 22, 2020

GNU awk supports \< and \> for start and end of word anchors, which works for GNU grep/sed as well

GNU awk also supports \y which is same as \b as well as \B for opposite (same as GNU grep/sed)

Intererstingly, there's a difference between the three types of word anchors:

    $ # \b matches both start and end of word boundaries
    $ # 1st and 3rd line have space as second character
    $ echo 'I have 12, he has 2!' | grep -o '\b..\b'
    I 
    12
    , 
    he
     2

    $ # \< and \> strictly match only start and end word boundaries respectively
    $ echo 'I have 12, he has 2!' | grep -o '\<..\>'
    12
    he

    $ # -w ensures there are no word characters around the matching text
    $ # same as: grep -oP '(?<!\w)..(?!\w)'
    $ echo 'I have 12, he has 2!' | grep -ow '..'
    12
    he
    2!

emmelaich · on Jan 22, 2020

Sure, but a fair bit of the value of the tool is it's consistency across platforms.

There's no point in awk if perl etc are ubiquitous and more consistent.

jolmg · on Jan 22, 2020

\< and \> work with GNU's awk:

  $ printf "fishstick\nfish\ngoldfish\n" | awk '/\<fish\>/' 
  fish

Annatar · on Jan 22, 2020

\b is Perl RE, not ERE. AWK not only supports ERE's, but POSIX RE's as well.

mistahenry · on Jan 21, 2020

On the other hand, grep can be far faster for searching alone than awk. I almost always use an initial grep for the string that will most reduce the input to the rest of the pipeline. Later, it feels idiomatic to mix in awk with matches like you suggested

davidgould · on Jan 21, 2020

Depends on the awk. mawk is surprisingly fast.

just_myles · on Jan 21, 2020

Right. I don't consider that particular exhaustive at all and this has helped me when I wanted to do quick searches.

bloopernova · on Jan 21, 2020

I always forget about that, and I should try more to remember it. Thank you for the tip!