OTA for Embedded Linux Devices: A practical introduction

09 May 2023 by Thomas Sarlandie

A core belief of Memfault is that we can ship faster when we have good infrastructure in place. An essential piece of this infrastructure is tools to send firmware updates over the air. It enables the team to ship more often and spend more time building features.

In this article, we look specifically at what is required to ship over-the-air firmware updates for Linux systems.

A good OTA setup should allow you to quickly prepare updates and ship them with confidence. In particular:

It should be predictable: systems should be identical (ideally, exact byte-level copies) regardless of whether they are a fresh install or have been updated 10 times in their lifespan.
It should be reliable: interruptions of network or power should not risk damaging the device.
It should be resilient so it can still update devices even when they are in a non-nominal state.

📺 Watch the webinar

This topic was also the subject of a Memfault webinar: OTA for embedded Linux Devices.

The Big Picture
Diving into the details
Conclusion
Further Reading

As you will see, many components collaborate to update your embedded Linux systems. Our goal in this blog post is to walk you through a complete update cycle, introducing the different components and showing how they interact with each other.

For today’s discussion, we will focus on a system built with Yocto, booting on an A/B partition scheme, with U-Boot as the bootloader and SWUpdate as the updater. This is an extremely common setup, completely open-source (free as beer and speech), and the one we use in Memfault Linux SDK examples. The concepts shown here will apply to a lot of other OTA strategies as well and it’s important to point out that Memfault for Linux is not limited to this setup.

Understanding Yocto and the A/B partitioning scheme

Yocto is a Linux system image builder. It uses configuration files to identify which components are required in the system, builds all the packages, and prepares the file system (both a .tar.gz and a .ext3 raw-image).

An A/B partition scheme allocates two system partitions on the device’s storage. One is the version that is running. The other one starts empty and will be filled with a new system image when we install an update.

This configuration guarantees that we always have a working copy of the system because we do not touch the active partition while we prepare the new one. It will still be there for a possible rollback after rebooting in the other partition.

Doing this protects us from damages that could result from partially installed updates: loss of power or connectivity could result in a system where some packages have been updated but not all and the system is not usable.

With Yocto and the A/B partition scheme, we can build very predictable systems. Firmware engineers love determinism.

The Big Picture

Let’s start with a general overview of the update process.

A firmware engineer, using a build system and some form of continuous integration system builds the update. The result is an artifact that can be used to update a system to the latest version.
The update is uploaded to an OTA backend which will announce its availability and serve it to eligible devices. This can be as simple as a firmware-latest file hosted on S3 but having the ability to serve different updates to different hardware versions, and serve different firmware versions to different groups of customers is a must-have in our opinion.
Devices will regularly contact the update server to download the update. We typically use a daemon or cron job to do this.
The same agent will also install the update on the inactive system partition.
Once the update has been installed and verified, the agent can reboot into the freshly updated system partition. This requires some coordination between the old system partition, the new one, and the bootloader.
Finally, the system running from the updated partition will reach out to the OTA backend to confirm that the update has been installed.

Now that we understand the different steps, let’s see how this is implemented in practice.

Diving into the details

🪛🔬 Tag Along!

We encourage you to follow along on an emulator or a real device. The Memfault Linux SDK example is one way to quickly get a system with all these components configured and you can use Memfault backend to host and distribute your update file. It’s free under the evaluation license.

Nothing we show here is specific to Memfault though and you can replicate the system using a different backend. Follow our QEMU Quickstart guide to get started in an emulator, or the Raspberry Pi Quickstart guide to do the same thing with real hardware.

Building the update

The first step to distributing an update over the air is to build and package it. Your build system should be fully automated (CI) and configured to always provide, for each revision of your source code, both a complete image to flash on a new device and an update package.

The update package is the input we provide to the updater. It contains enough information to update a device running an older version of the firmware to the new version. With our A/B partitioning scheme, this is easily achieved by shipping a new system image as the update.

📦 Delta updates are also possible with swupdate but we will not discuss them in this article. Refer to Delta Update with SWUpdate for more information.

A SWUpdate update package is just a cpio archive in which the first file is an update descriptor. The meta-swupdate layer for Yocto provides the swupdate-image class to do this easily and the Memfault example includes a swupdate-image target.

All that is required to build a new update package is to run bitbake swupdate-image.

$ bitbake swupdate-image
...

You can inspect the resulting .swu archive. As discussed, it contains the update descriptor (we will look at it in detail in a minute) and the new filesystem image.

$ cpio -vt < tmp/deploy/images/qemuarm64/swupdate-image-qemuarm64.swu
-rw-r--r--   1 build    users        1111 Apr 25 00:53 sw-description
-rw-r--r--   1 build    users    48618608 Apr 25 00:53 base-image-qemuarm64.ext4.gz

✒ ️ Conventions for this blog post

Whenever we are showing outputs and commands typed on the developer machine used to prepare the image, we will show a $ shell prompt. When we are showing commands and output typed directly on the target device, we will use the # prompt.

Distributing updates

To distribute this new update over the air we will need to make it available to our fleet of devices. This typically requires two steps:

Upload the update file to a CDN so it can be downloaded quickly by a large number of devices at a reasonable cost.
Indicate that the update is available and provide the link to it on an “update server” endpoint that our devices will contact.

📢 Distributing updates is one of the core features of Memfault.

You can follow along with a free trial account: create a new OTA version and activate it.

Memfault OTA offers a lot more than just serve the latest update files to your customers:

Control which devices receive which updates so you can have multiple groups of devices (we call them cohorts) “on” different versions at the same time: internal users, beta users, production users, etc.

Progressively roll out updates to devices so you have time to detect issues with the new build before too many devices have been updated

Serve different versions of the update for different hardware revisions

Fetching an update

In the SWUpdate world, we use a daemon mode called Suricatta to communicate with the update server and detect when a new version is available. This daemon mode needs to be enabled at build time and it adds a few options to the swupdate command.

Looking at the swupdate configuration, you will find that we provide the daemon with information about the device’s identity (current image version, hardware version) and how to fetch updates (server URL, authentication token):

# cat /tmp/swupdate.cfg
...
suricatta :
{
  ...
  url = "https://device.memfault.com/api/v0/hawkbit";
  gatewaytoken = "1gfiixxx";
};
identify = (
  {
    name = "memfault__current_version";
    value = "0.0.2";
  },
  {
    name = "memfault__hardware_version";
    value = "qemuarm64";
  },
  {
    name = "memfault__software_type";
    value = "main";
  } );

This daemon runs in a SystemD unit and one way to monitor its activity is to use the system journals:

root@qemuarm64:~# journalctl -u swupdate -f
... SWUPDATE running :  [start_suricatta] : Suricatta awakened.
... SWUPDATE running :  [channel_log_effective_url] : Channel's effective URL resolved to https://device.memfault.com/api/v0/hawkbit/default/controller/v1/mf14
... SWUPDATE running :  [server_get_deployment_info] : No pending action on server.

As shown here, it regularly calls the server to see if an update is available.

Installing updates

When the update becomes available, swupdate will download it and write it directly to the inactive system partition. Writing the update directly in place directly is an important characteristic of many update systems. It offers the benefit of not requiring large temporary storage to store the update before installation.

An interesting question here is how does swupdate know where to install the update? As we have said before there are two system partitions on our device and we could be running from any of them.

Looking inside the update descriptor (the first file in the update package), we find that it defines two updates strategy named copy1 and copy2:

$ cat swupdate-description
software = {
    version = "0.0.2";

    qemuarm64 = {
        hardware-compatibility: [ "1.0" ];
        stable: {
            copy1: {
                images: (
                    {
                        filename = "base-image-qemuarm64.ext4.gz";
                        type = "raw";
                        compressed = "zlib";
                        device = "/dev/vda2";
                    }
                );
                ...
            }

            copy2: {
                images: (
                    {
                        filename = "base-image-qemuarm64.ext4.gz";
                        type = "raw";
                        compressed = "zlib";
                        device = "/dev/vda3";
                    }
                );
                ...
            }

And if you run ps |grep swupdate you will see that swupdate was started with the -e stable,copy2parameter. This parameter is set by /usr/lib/swupdate/conf.d/09-swupdate-args which considers which device is mounted as/ to decide if we will need to update copy1 (in /dev/vda2) or copy2 (in /dev/vda3).

The partition currently mounted is active. The other one is inactive (it’s not even mounted) and will be overwritten at a raw byte level with the new image.

Once the update is written to disk, a crucial step is needed to activate it. The remaining content of the file provides it:

...
  copy1: {
      ...
      images: ({ ... device = "/dev/vda2" ... })
      uboot: (
          {
              name = "rootpart";
              value = "2";
          }
      );
  }
  copy2: {
      ...
      images: ({ ... device = "/dev/vda3" ... })
      uboot: (
          {
              name = "rootpart";
              value = "3";
          }
      );
  }

Here we tell SWUpdate to write a configuration variable for the bootloader. This configuration variable will indicate the partition number that will be used for the next boot. And as you can see when we write the image to /dev/vda3 we will set the variable rootpart to the value 3. More on U-Boot variables in just a second!

Finally, we need to restart the device. This can be as simple as running the reboot command.

In the case of this Memfault-equipped example device, we want to record why the device rebooted so we execute memfaultctl reboot --reason 3 (3 is the Firmware Update reason in Memfault nomenclature).

More complex update scenarios can easily be implemented. For example by setting a flag on the system to indicate that an update is installed and ready. The user will reboot when they are ready.

Again, this is defined in SWUpdate’s configuration:

# cat /tmp/swupdate.cfg
globals :
{
  postupdatecmd = "memfaultctl reboot --reason 3";
};

Bootloader

The bootloader is the crucial piece of software that runs first when your device starts. It plays an essential role in our support for OTA updates because it provides the switch to run the device on the system partition A or partition B.

U-Boot is a surprisingly capable piece of software and you may be surprised to learn that the A/B switch is implemented in the form of a script run on every boot. This script is typically specific to your hardware because it also sets some essential boot parameters and passes them to the kernel.

Let’s look at the RaspberryPi version in our embedded system example (the QEMU version is here):

saveenv
fdt addr ${fdt_addr} && fdt get value bootargs /chosen bootargs
if env exists rootpart;then echo Booting from mmcblk0p${rootpart};else setenv rootpart 2;echo rootpart not set, default to ${rootpart};fi
load mmc 0:${rootpart} ${kernel_addr_r} boot/@@KERNEL_IMAGETYPE@@
setenv bootargs "${bootargs} root=/dev/mmcblk0p${rootpart}"
@@KERNEL_BOOTCMD@@ ${kernel_addr_r} - ${fdt_addr}

The U-Boot command line allows you to run commands and interact with the environment variables. It’s a great tool to debug OTA and the boot process in general:

U-Boot 2022.01 (Jan 10 2022 - 18:46:34 +0000)

DRAM:  512 MiB
Flash: 64 MiB
In:    pl011@9000000
Out:   pl011@9000000
Err:   pl011@9000000
Net:   eth0: virtio-net#32
Loading Environment from FAT... OK
Hit any key to stop autoboot:  0

=> if env exists rootpart; then echo "rootpart: ${rootpart}"; else echo "NO";fi
rootpart: 3
=> env print -a
... prints all the environment variables

First, let’s talk about where the environment variables are stored. U-Boot can be configured to persist the environment in various locations and this is something that will most likely be defined by your board vendor in the BSP package. For example, the meta-raspberrypi BSP specifies that U-Boot should store the environment in a file called /boot/uboot.env. On some other systems, the environment may be stored at a fixed location in Flash, or on an EEPROM.

Second, note that the boot script helps with three essential components:

It loads the kernel from the new root partition. This means we can update the kernel, and it’s important to remark here that it works because the kernel is part of the system image instead of being in a /boot partition ;
It tells the kernel which system partition to mount as root: root=/dev/mmcblk0p${rootpart} ;
Finally, it loads the device tree and passes it to the kernel (with the fdt addr ${fdt_addr} command). The device-tree is usually appended after the kernel. This little trick is highly specific to the Raspberry Pi and allows dynamically loading device tree fragments (we will refer the interested reader to this article for more details). Again, this is something that is specific to your board and should be received with your board support package.

Notifying the server that the update has been received

Our device reboots with the new kernel and the new filesystem. The final step of our update cycle is to notify the update server that the update has been installed successfully.

Looking once again at /usr/lib/swupdate/conf.d/09-swupdate-args, we notice the following lines:

state=$(fw_printenv ustate | cut -f 2 -d'=')
if [ "$state" == 1 ]; then
 # Confirm update Success to server
        SWUPDATE_SURICATTA_ARGS="-c 2"
else
        SWUPDATE_SURICATTA_ARGS=" "
fi

The first interesting command here is fw_printenv which allows us to read the U-Boot environment file from our system. For example, you can run fw_printenv rootpart to see on which partition you are currently running.

The second interesting element here is the variable ustate that is read by swupdate. This variable was set to 1 by swupdate after installing the update. It is used here to notify the update server that we have successfully installed the update. It can also be used by more complex bootloader scripts to identify that an image is “under test” and can be reverted if it does not boot well.

Conclusion

We have walked through all the steps required to perform OTA on an embedded Linux device. All the mysteries and tricks have been revealed. Following the strong principles of the Unix world, we use simple parts and basic communication primitives to build a much more complex system.

Knowing now what are the parts and the channels you are equipped to debug issues and tackle some additional challenges like delta updates and secure boot.

Let us know in the comment below what else you would like to read about in the world of embedded Linux devices!

Interrupt

by Memfault