What IoT device manufacturers should learn from the "IoT worm"

November 25, 2016

The research paper “IoT Goes Nuclear: Creating a ZigBee Chain Reaction” by Ronen, O’Flynn, Shamir, and Weingarten garnered moderate media attention (here, here, here, etc) in early November, 2016. As I have worked extensively in ZigBee offensive and defensive security, but never specifically on the ZigBee Light Link (ZLL) profile, I was interested to dig-in and see what the main techniques and issues were, and what lessons other device manufacturers should take away from this disclosure.

Level-Set / FAQs

So, if you didn’t read the entire paper and don’t trust news articles to give you the technical meat, what’s the bottom-line?

What techniques were used?

  • Serial sniffing
    • Disclosed the ZigBee stack, that was then used for analyzing the code and eventually finding the ZLL Touchlink bug.
  • SPI sniffing from SoC to flash
  • Hitting unauthenticated FW update APIs pretending to be a device
  • Advanced side-channel power analysis techniques (combination of DPA CPA)
    • To extract the master symmetric key used for Hue firmware encryption & authentication.

What were the core issues?

  • Heuristic pairing mechanism for “proximity” (ZLL Touchlink)
    • This is weak by design against an attacker who can make their own hardware (amplifiers, etc) as it uses signal strength indicators (RSSI) to approximate “distance”.
    • It was exploitable in a worm situation (unmodified hardware) due to an implementation bug (see below).
  • Public “master key” for network key provisioning
  • Shared “master key” for firmware encryption/authentication on all devices
    • Symmetric keys in this use are not preferred, but are not uncommon in IoT devices due to processor/memory constraints. Atmel BitCloud (29) and TI Crypto-Bootloader (30) also use symmetric keys.
  • Chip security not used.
    • The ATMega2564RFR2 has a flag that could stop the application code from reading the boot loader. This was not set, so their compromised firmware could dump the boot loader to get more info.

As mentioned earlier, if you were an attacker with a ZigBee sniffing/injection platform such as the ApiMote with an external antenna & amplifier (or another platform such as an SDR) you could spoof the “proximity” heuristic used by Touchlink by simply increasing the power that you transmit at and the sensitivity when receiving. That however doesn’t work for the worm use case.

The Atmel stack used in the Hue lights, as disclosed by serial logs, had an implementation bug in the state machine. I’ll describe it here.

In the Atmel stack, the value of “transaction ID” is checked against an array of 3 state structures which are all initialized to zero on boot. This is usually OK as zero is a non-supported transaction ID value, so they know to ignore that entry in the array (as invalid/unfilled). However, the check to make sure the input value (provided by the RF message) is not zero is only done in the “expected” protocol path — when a “Scan Request” message is received. This check is not done again when the reset message comes in – and thus the actual reset message can be sent with a transaction ID of 0 and it will be accepted if any of the 3 slots are not filled with legitimate Touchlink states at that time.

Despite this issue on the “reset” path, there were checks that blocked an attacker from directly using the “join to a new network” command (even though the message is handled similarly to a reset) as Atmel’s function for the decryption of network keys used to join to the new network checked that the transaction ID was not 0. Thus the bug to get the message received with an ID of zero was ineffective. However, this issue does not prevent performing the simple reset.

What about the differential attack on the firmware signing key?

This part of the work was, in my opinion, the most concerted effort by the researchers to overcome a security mechanism, and represents significant effort applied by an advanced “attacker.” It’s important to note AES-CCM* as used by ZigBee had already been attacked in previous works, but this research targeted the AES-CCM used in the firmware authenticated encryption. As a result, the constraints were different, especially with number of blocks and the nonce setup being unknown.

I will not attempt to make a short summary of the different attack techniques (including two side-channel power analysis attacks) combined to achieve the extraction of the key. What I found interesting was the choice to attack the AES-CBC used in the authentication portion of CCM, as they can flash firmware which will be run through the verification, and thus they control the cipher text which the AES-CBC is being computed over. When they recover the key via this, that’s the same key as used for the CTR encryption portion of CCM. (As a reminder, AES-CCM uses the same symmetric key for AES-CTR to provide confidentiality and AES-CBC to provide integrity by producing an authentication tag.)

What did the manufacturer do right?

As someone who has done many reports of vulnerabilities, both privately for manufacturers and some published, I feel like we all-to-often focus only on what was “wrong,” at the expense of acknowledging some things that were done right. Let’s look at what was done well to secure this device:

  • Use of authenticated cryptography - symmetric AES-CCM
  • The SPI flash has encrypted/authenticated images on it
    • This means that the attacker could not successfully perform a simple binary patch of firmware.
  • Although this was probably not done for a security reason, a Harvard architecture processors was used, which made exploiting vulnerabilities more difficult as code can’t be executed from memory, forcing an attacker to use a technique like ROP.
  • The debug fuses in the SoC were set
    • This forced the attackers to do a side-channel attack to get the keys needed to be able to get their code into the SoC and thwarted JTAG debugging and memory dumps.

Even with these things done “right” – including some items that I see overlooked in many devices on the market – it wasn’t enough to stop a determined attacker. I would argue, however, that doing these things significantly raised the level of effort needed to attack the system.

My two cents on the research

I think this is excellent work, and I’d love to see some parts of their tools published so they can be reused by the security community – specifically (a) the Python code they wrote for handling the ZLL protocol (similar to the Scapy code used with KillerBee) and (b) the integration of the CC2531 transmit/receive code into KillerBee, so that other people can leverage that hardware (in addition to the devices supported today such as the RZUSBSTICK and ApiMote) in a common platform.

I saw only one minor potential flaw in the write-up version I read, which said on Page 5 that “For this encryption, a secret “Master ZLL key” is used. This key is shared by and stored on all ZLL certified products. It is not specified in the standard and is only provided to ZigBee alliance members that are developing ZLL certified products. Unsurprisingly this master key was leaked in 2015 and can be found on-line [10]”. I think it’s important to note that the key referenced in the cited work is included in some ZigBee standards, the distribution of those documents is not limited to only ZigBee alliance members (so it wasn’t leaked as much as “republished” in the cited work).