A toolkit for secure signing and encryption of Xilinx images

Xilinx UltraScale+ is a popular chip choice for implementing Secure Boot, and its successor Versal ACAP is gaining interest. Prototyping signed and encrypted boot images is quite easy with the Xilinx-provided bootgen tool. But when going into production, things get more difficult. This writeup is about the challenges and how they can be solved. As this challenge applies to multiple customers, the Wapice security team has developed an additional toolkit that makes secure release signing and encryption easy.

Protecting the keys

It is a good practice to keep long-living release keys out of the build system. After all, keys are persistent and are not meant to be rotated. Recovering from a leak of a long-living signing or encryption key may be impossible.

Xilinx has also thought about this - bootgen can be used in "HSM mode", where signing can be done in a remote server. This removes the need to give private signing keys to the build system. However, a HSM mode build is quite different from a normal build. Instead of a single invocation, HSM mode use requires quite a number of bootgen invocations, each with a different configuration file. This is error-prone and difficult to maintain.

When it comes to encryption, bootgen does not support any way to encrypt without secret keys present, not even in HSM mode.

The only way to keep encryption keys out of the build host is to execute bootgen in a secure server and give only compiled binaries as input. This splits a part of the build away from build servers, and thus out of reach by developers with the ability to fix problems. When this is combined with a use of HSM mode, creating the proper bootgen configurations can be downright difficult. If there is need to support multiple bootgen versions, things get ugly.

Avoiding problems with release builds

In order to make reliable release builds, one should minimize the differences between a test build and a release build. Use of the HSM mode only in releases introduces a big difference. Also, moving bootgen execution to a secure server does not help in reliability.

In HSM mode, bootgen doesn't do all of the checks it normally does. This means that test boot images can be built reliably, but release boot images cannot!

Bootgen suffers from encryption related issues even if the HSM mode is not used:

  • Bootgen does not warn if a known insecure encryption configuration is used for encrypting the PMU binary.
  • Bootgen requires keys and nonces to be in an encryption input file. This make it easy to accidentally use the same AES-GCM encryption keys and nonces in multiple boot image builds. A normal solution is to generate new nonces in every build. The same files also contain the keys, so just deleting the files is not a solution.
  • It is possible to accidentally (not very easily) configure encryption to re-use nonces in different partitions within a boot image.

A solution with custom tooling

Better tooling must separate release signing and encryption from general build steps and tools. A good way to do this is to apply signing and encryption to an already built boot image. So the release boot image is initially signed and encrypted using insecure keys in the build server. This ensures that all processing needed to prepare for encryption are properly done. Then, after a normal build process, the image can be re-signed and re-encrypted with secure release keys. This re-signing and re-encryption could happen in a secure server or using a signing and encryption service remotely. The re-signing and re-encryption does not need complex configuration or processing, as all this has already been done when the boot image was initially created.

Our Python-based tool implements re-signing and re-encrypting the boot image after it has been built. Test and release builds are built in a similar way up to the re-signing/re-encryption step. The re-signing/encryption only changes signatures and encrypted data, not any other attributes of the boot image.

We have added additional warnings and automatic generation of nonces and keys that are not supposed to be long living. This makes it easier to implement the signing/encryption step securely.

Python example: re-signing a boot image

The following example script uses our tool xilinx_remote_sign to change signing keys in a boot image. It assumes that the current working directory contains an unencrypted boot image signed with "dummy" keys named BOOT_with_dummy_keys.bin, and two 4096-bit RSA keys primary.pem and secondary.pem, similar to what the following generates:

openssl genrsa -out filename.pem 4096 

The script produces a BOOT.bin that uses the new keys instead of the dummy keys.

from pathlib import Path
from Crypto.PublicKey.RSA import RsaKey, import_key
from Crypto.Signature import pkcs1_15
from xilinx_remote_sign import CryptoProvider, Hash, PrimaryKeyInfo, SecondaryKeyInfo, change_keys

secret_primary_key = import_key(Path("primary.pem").read_text())
secret_secondary_key = import_key(Path("secondary.pem").read_text())

class MyCryptoProvider(CryptoProvider):
    def get_signing_ppk(self, key_info: PrimaryKeyInfo) -> RsaKey:
        return secret_primary_key.public_key()
    def get_signing_psk(self, key_info: SecondaryKeyInfo) -> RsaKey:
        return secret_secondary_key.public_key()
    def sign_with_spk(self, key_info: PrimaryKeyInfo, hash: Hash) -> bytes:
        return pkcs1_15.new(secret_primary_key).sign(hash)
    def sign_with_ssk(self, key_info: SecondaryKeyInfo, hash: Hash) -> bytes:
        return pkcs1_15.new(secret_secondary_key).sign(hash)

old_boot_image = Path("BOOT_with_dummy_keys.bin").read_bytes()
new_boot_image = change_keys(old_boot_image, MyCryptoProvider())
Path("BOOT.bin").write_bytes(new_boot_image)
print("Wrote BOOT.bin")

So with just a few lines of Python code, it is possible to change the signing keys of an existing boot image. The current official Xilinx tooling can't do this at all. Furthermore, the script only needs to handle public keys locally and sign with the private keys. So instead of storing the private keys in files, you can put the private keys on a secure server and invoke whatever API it provides in your sign_with_spk() and sign_with_ssk() methods.

To allow using different secondary keys for each partition, SecondaryKeyInfo contains information about what partition is currently being signed.

Calling change_keys() also checks for known insecure configurations. If you want to run the checks without changing keys, our tool supports that too, you can run on the command line:

python3 -m xilinx_remote_sign verify BOOT.bin

Nonce issues

Encrypted Xilinx UltraScale+ boot images are more complicated than you would expect. They can have multiple encrypted partitions, but they contain only one nonce. To prevent reusing the nonce for different partitions, which would break the encryption, the partition ID is added to the last byte of the nonce before using it. So if the nonce stored in the boot image is 112233445566778899aabbcc (in hex), then partition 0 is encrypted with ...aabbcc, partition 1 is encrypted with ...aabbcd, partition 2 is encrypted with ...aabbce, and so on. This is how Xilinx's bootloader (FSBL) works, so changes to this would require refactoring the FSBL.

This makes re-encrypting a bit inconvenient, and not all encryption APIs can be used. If the encryption API selects a random nonce every time you encrypt something, it's impossible to encrypt multiple partitions with nonces that differ from each other in the right way. To be clear, this isn't a limitation of our tool, it's a limitation of how the boot image file format works.

Adding re-encryption support to CryptoProvider for our tool looks like this:

from Crypto.Cipher import AES
from xilinx_remote_sign import add_to_last_byte

class MyCryptoProvider(CryptoProvider):
    def encrypt_without_opt_key(self, datas: dict[int, bytes]) -> tuple[dict[int, bytes], bytes]:
        base_iv = secrets.token_bytes(96 // 8)  # 96 random bits
        result = {}
        for partition_id, data in datas.items():
            # Encrypt with 256-bit AES-GCM
            cipher = AES.new(secret_aes_key, AES.MODE_GCM, nonce=add_to_last_byte(base_iv, partition_id))
            encrypted, tag = cipher.encrypt_and_digest(data)
            # Tag can be used to check if data was corrupted. Append to end of encrypted data.
            result[partition_id] = encrypted + tag
        return (result, base_iv)

Here secret_aes_key is the key needed to decrypt the resulting BOOT.bin. It is typically written to an eFuse and stored in a secure server that provides an encryption API. The base_iv is the nonce that will be stored in the boot image (in AES-GCM, IV and nonce mean the same 96-bit value).

Xilinx's newer Versal ACAP architecture solves this problem cleanly, and does what we believe Xilinx should have done in the first place: a Versal boot image simply contains a nonce for each partition.

The method name encrypt_without_opt_key() means that this mode of encryption doesn't use an Operational Key. That would be a second key that is also stored inside the secure server, and written onto the boot image encrypted with the "main" key. This is supported by our tool as well.

Python example: re-encrypting a boot image

Above we explained how to subclass CryptoProvider for re-encrypting boot images. We need a couple other changes to the re-signing script too.

The dummy encryption key used during the build needs to be passed when calling change_keys(), so that the script can decrypt the data in the boot image. Bootgen stores the key in a .nky file that it autogenerates when you pass in -p foo to bootgen. The .nky file is a text file that looks something like this:

Device       foo;

Key 0        7E98D6058C680A06C690882706CE0FA664BDE5776DF0D0A09C22F63BBB42601A;
IV 0         1EAB17FC284CAD3887C24384;

Key 1        EBE1B8E12FC0647FF59A732437F7A2CA0B82CA7550A5890F1D9F73470D538303;
IV 1         CE00773E94D8EF0549AD7548;

The name after Device is what you provide after -p on the bootgen command line, and it isn't actually used anywhere besides storing it in the .nky file.

For decrypting, we only need to know Key 0. All other keys are stored on the boot image, encrypted with Key 0. Without our key changing tool, you would typically write Key 0 to an eFuse. Our goal is to enable storing the production Key 0 in a secure system, separate from the system that builds the images, maybe even separate from the system that runs the re-encryption script.

To decrypt the dummy-keys boot image, our tool comes with a utility function parse_nky_file() that takes in the contents of a .nky file and returns a Python dict of keys. So our re-encryption script looks like this:

from pathlib import Path
from Crypto.Cipher import AES
from xilinx_remote_sign import CryptoProvider, add_to_last_byte, change_keys, parse_nky_file

class MyCryptoProvider(CryptoProvider):
    ...

old_key = parse_nky_file(Path("dummy.nky").read_text())["Key 0"]
old_boot_image = Path("BOOT_with_dummy_keys.bin").read_bytes()
new_boot_image = change_keys(old_boot_image, MyCryptoProvider(), old_crypt_key0=old_key)
Path("BOOT.bin").write_bytes(new_boot_image)
print("Wrote BOOT.bin")

By default, our tool derives other keys and nonces in a secure and random way. If you need reproducible builds instead, the tool supports that too, but note that it reveals whether a partition was changed without decrypting it. This minor information leak is not an issue in most cases, and makes the reproducible build easy to use and understand in practice.