Home Posix basics
Post
Cancel

Posix basics

Note: I will be using CentOS 9 Stream which came out in Dec 2021. I will not be covering UNIX OS.

We will dive in POSIX standardization, on the best practices for system interfaces, command interpreter and utilities. Our goal is to understand why POSIX exist, and how to apply their best practices.

Definition

POSIX stands for Portable Operating System Interface and defines a set of standards to provide compatibility between different computing platforms.

The latest version is available here

Not all operating systems are POSIX certified (such as Solaris or macOS), but they can be fully or partly POSIX compatible, as such, most OS try to be mostly POSIX-compliant.

A few examples of POSIX-compliant:

  • Android
  • FreeBSD
  • Linux
  • VMWare ESXi
  • Cygwin

When writing scripts/programs to rely on POSIX standards, you ensure to port them among a large family of Unix derivatives.

Shell language

The shell is a command language interpreter, which includes a syntax used by the sh utility.

The difference between #!/bin/bash and #!/bin/sh is slim, as It spawns from the same binary:

image

image

Using sh is simply having bash with --posix option after startup files are read. These startup files are the ones read and executed from the expanded ENV variable.

It is not the case on every OS though, busybox can link sh to a different shell, but one common point they all have is having a /bin/sh. This is why in our case, It is better to use sh as a shell basis for our scripts.

Variable expansion

It is possible to manipulate variable with prefix, suffix, default, fallback and message in portable shell syntax.

Consider this shell script:

image

It will print the variable value depending on its parameters

image

While there is a few more, for example to search and replace, It is interesting to know It is POSIX compliant to do so.

Be careful with unset as It is not compliant on arrays (only variables and functions) and declare

Environment variables

Environment variables should always use UPPERCASE and underscore, like so:

image

Only way to set an environment variable is through export, but they will only be available in the current session.

Program exit status

Standard exit code is 0 for success, any number from 1 to 255 is to denote something else.

To access the last exit code, you can use $?

image

It is possible to suppress the exit code by nullifying STDERR and adding an OR, but this is not recommended

image

Command line utility API conventions

Standard POSIX utilities includes an argument syntax to help them process.

Unless otherwise noted, all utility descriptions use this notation, which is illustrated by this command:

image

The utility is always named first, then comes the option-arguments, and includes two optionals modes : short options (single -) and long options (double --).

In long options, parameters are added with --long_option=<PARAMETER>, while short options are one character-long and requires a between their parameters, e.g. `-s `.

Optional arguments are then followed by mandatory arguments (called operands). You can include -- in-between to specify the ends of the optional-arguments, and helps in specific directory like systemd, where some filenames begin with -

A few guidelines (or design pattern) on these utilities:

  • Thou shalt not use more than nine characters or capital letters for their name
  • Option-arguments should not be optional

While nothing is said about -h or -v, they are usually kept for help and version section.

Filenames

Filenames cannot contain “/” nor ASCII NUL “\0”. While this is flexible, a few more tiny limitations is necessary to be added upon the existing set of limitations.

The character - is one to look out for, as It can lead to disaster.

Let’s create a file named -rf

image

And see what happens when someone try to remove all files in the directory with rm *, It should not destroy directories right ?

image

Well.. not only did It destroy the directory, It also kept the -rf file. This could lead to bigger disaster.

If you need to remove it, use --

image

There is many more examples, but this is the reason why you need to ensure your file names should not start with - or control chars (such as \n\t).

Directory structure

Most linux distribution conform to FHS (Filesystem Hierarchy Standard), which defines a stricter set of rules to define the directory structure.

POSIX defines a few guidelines on this structure:

  • Applications should not be writing files in / or /dev
  • /tmp should be made available for applications in need of temporary files creation.
  • /dev/null is an infinite data sink, data written to or reads from this path should always return EOF
  • /dev/tty Synonym for controlling terminal associated with the process group of that process

Regular expression

POSIX defines two regular expression syntax, called BRE (Basic) and ERE (Extended).

BRE provides extensions to achieve consistency between utility programs such as grep or sed.

In BRE, It defines the following syntax:

MetacharacterDescription
.Matches any single character
[ ]Matches a single character that is contained within the brackets
[^ ]Matches a single character that is not contained within the brackets
^Matches the starting position, if It is the first character of the regex
$Matches the ending position, if It is the last character of the regex
*Matches the preceding element zero or more times
\{m\}Matches the preceding element exactly m times
\( \)Defines a capturing group, and treated as a single element

While defining character classes that are used within brackets

POSIX classsimilar tomeaning
[:upper:][A-Z]uppercase letters
[:lower:][a-z]lowercase letters
[:alpha:][A-Za-z]upper- and lowercase letters
[:digit:][0-9]digits
[:xdigit:][0-9A-Fa-f]hexadecimal digits
[:alnum:][A-Za-z0-9]digits, upper- and lowercase letters
[:punct:] punctuation (all graphic characters except letters and digits)
[:blank:][ \t]space and TAB characters only
[:space:][ \t\n\r\f\v]blank (whitespace) characters

And the more advanced extended regular expressions can sometimes be used with Unix utilities (grep -E, sed -E, or default in awk), the main difference is that some backlashes are removed, and non-greedy quantifiers (?)

Shell syntax

Let’s use Shellcheck and test some commands to see how to write POSIX compliant code. It assumes you are somewhat familiar with shell scripting.

The script:

image

The execution:

image

Shellsheck:

image

In the end, some rules to remember are:

  • Use test or single bracket for comparison. Use gt/lt for numbers, and avoid strings comparison with ==.
  • Do not use arrays, and do not use declare let typeset or blank spaces when declaring variables
  • Always quote strings, and use printf instead of echo

Conclusion

While this ends POSIX basics, there is a lot to review in order to be POSIX compliant across all your shell scripts. The most difficult part is to know which tool is part of the core package, and which one requires an installation check.

A few tools exist to check your syntax, like Checkbashims or Shellcheck. One tool, not really POSIX related, is useful for checking your bash commands is explainshell

image

Being POSIX compliant can be a pain (specially on grep, awk and sed implementations). It should serve as a common set of best practices for your scripts.

Credits

https://riptutorial.com/posix

https://pubs.opengroup.org/onlinepubs/9699919799/toc.htm

https://www.baeldung.com/linux/posix

https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions

https://betterprogramming.pub/24-bashism-to-avoid-for-posix-compliant-shell-scripts-8e7c09e0f49a

This post is licensed under CC BY 4.0 by the author.