AWK

From ArchWiki

This article or section is a candidate for merging with Core utilities.

Notes: #Troubleshooting is not Arch-specific, everything else can be covered on the Core utilities page. (Discuss in Talk:AWK)

AWK is a small programming language designed for text processing. Its name is derived from the surnames of its authors: Alfred Aho, Peter Weinberger, and Brian Kernighan. The language is standarized and widely available on Unix-like systems.

Installation

On Arch, the awk(1p) command is provided by gawk, which is installed by default, with native Unicode support and a load of extrafeatures.

Note: gawk is designed to be able to gain additional language features at runtime by dynamically loading the so-called extensions -- extensions that were bundled within the software distribution are out-of-box to use, but the separately maintained gawkextlibAUR-* plugins do require manually installation.

Alternative implementations

Like many other core utilities, there are several more-or-less compliant implementations available:

  • BusyBox — The BusyBox implementation is not that well performant but has a smaller footprint suitable for memory strained environment.
https://git.busybox.net/busybox/tree/editors/awk.c || busybox
  • GoAWK — AWK implementation in Go language.
https://github.com/benhoyt/goawk || goawkAUR
  • nawk — The "new" AWK as described in The AWK Programming Language, a.k.a. BWK AWK or the One-True-AWK, is now co-maintained by Arnold Robbins and B. W. Kernighan, featuring UTF-8 and csv support.
https://awk.dev/ || nawk
  • mawk — A rather performant AWK implementation.
https://invisible-island.net/mawk/ || mawkAUR

Troubleshooting

Assignment to ARGC variable via -v option does not preserve in runtime

Although undocumented, it appears that many implementations will reset the ARGC variable internally after processing the variable assignment of -v options specified on command line. Therefore, to get desired value of ARGC variable in runtime (e.g. BEGIN code blocks), it's required to set the variable in code block directly:

BEGIN {
  ARGC=1;
  ...
}
Note: An upcoming update to POSIX has documented this issue specifically.

See also