IIoT security: protecting specialised edge ML devices
Imagine a picturesque country landscape: sunflower fields to the horizon, narrow forest lines, animals jumping from den to den, birds flying. The fields should be properly maintained to bring a rich harvest. A [REDACTED] innovative company builds industrial IoT devices to monitor crop growth and maintain fields.
The primary purpose of devices is to collect data about crops and analyse it offline on the device, showing immediate results to the operators. When the internet connection is up, devices communicate with the server and receive updated versions of ML models.
The [REDACTED] company works in a highly competitive environment. If any device is stolen, they want to mark it as compromised and prevent competitors from accessing the collected data and ML models.
We’ve designed and implemented a set of security controls to protect the whole ecosystem: devices, data, ML models, and communication lines. IIoT security starts from a device provisioning pipeline and goes through all stages and components.
Industry
IIoT security
AI / ML security
Soil enrichment
Technology stack
Python, Django & PostgreSQL
GCP cloud + on-prem
ML / TensorFlow
Raspberry PI 4, Raspberry OS
Regulations / standards
GDPR
IEC standards
NIST RMF, SP 800-53, SP 800-57, SP 800-213, NISTIR 8259 series
Challenges
Hardened but maintainable IIoT devices
To make reverse engineering more complicated, the operating system should be hardened, all unnecessary packages removed, and default accesses disabled. At the same time, a device should have enough functionality to ensure maintenance and updates.
Thing you have, thing you know
Devices are “linked” to their operators, allowing only authorised personnel to interact with a device and receive data. The protection should go beyond simple account passwords, and operator and device location can often change.
Heterogeneous ecosystem
Each ecosystem component has its characteristics and applicable risks: security controls for IIoT devices and fleet management backend are pretty different, and the protection of ML models is generally a separate issue. Adding security into such a system should improve its resilience without adding fragility.
Technology requirements
Data and IP protection
Telemetry data, ML models, logs, access credentials—if the device operates data, it should protect it during the whole lifecycle.
Fleet management
There should be a way to provision and operate devices at scale. Assemble, upload and automatically harden firmware, upload applications and data, and verify that the system works. Monitor, update remotely, flag as compromised.
Protection against reverse engineering and tampering
Small devices can be lost or stolen. They should be resilient against attempts to reverse engineer, connect, and dig through data.
Our approach
Following NIST standards
As this project lies at the intersection of regulations and industries, we selected NIST standards as a source of requirements, especially SP 800-37, SP 800-53, SP 800-57, and SP 800-213. While following standards word-by-word is overkill, using them as a baseline allowed us to build well-rounded security defences.
Full lifecycle, coordinated security controls
The ecosystem contains multiple parts: operations on devices, communication between devices and centralised server, fleet management, ML models training, etc. Security protection measures follow sensitive assets lifecycle and go through all layers.
Securing a pipeline, not just software
When building thousands of IIoT devices, we pay attention to the security of a particular device and the security of the production and provisioning pipeline. Looking at security from a bird's eye helps to mitigate supply chain risks as early as possible.
Solution
We designed, implemented, and validated security protection measures through hardware, firmware, software, ML, data layer, and communication. As a result, the Hive and the Queen perform essential missions in the fields, react to potential compromise, and work only with authenticated personnel.
Note: as the project has a certain level of sensitivity, some technical details are purposely omitted, but the high-level design is presented as is.
Architecture scheme
The Hive of devices communicates with the Queen server: sends telemetry data, receives firmware and software updates. The Queen stores information about devices, their activity (last time online), their status (active, passive, compromised), usage patterns (audit logs).
Threat modelling
Designing security controls starts from threat modelling, especially for a large project like this one. During threat modelling, we focused on UX: how we can distinguish an actual operator from a competitor. It is how we thought about using multi-factor authentication with an external “activation” device.
Threat modelling showed risky areas:
- a device-as-a-blackbox (its hardware, firmware, software, data);
- a communication channel between the Hive and the Queen;
- receiving updates of data and software (which could be intercepted, stolen, or corrupted).
SSDLC
As the ecosystem will be further updated and improved, we pushed security practices to each step of the hardware and software life cycle. SSDLC includes security design documentation, threat modelling and intelligence, secure coding and CICD pipeline, and generally pushing security into developers’ minds.
We paid extra care to CICD security and automated security testing: scanners and linters, dependency and vulnerability management tools, memory and fuzzing tests on each pull request, automatic packaging and cross-version tests, and so on. CICD reduces mistakes by reducing the number of manual steps in the pipeline.
IIoT device provisioning pipeline
Devices come pre-assembled (which is out of the scope of this story), and the only things left are: to install and harden their firmware, install and configure software, download the latest ML models, and configure an external activation device (let’s say a USB stick).
The device provisioning pipeline—or a set of scripts, to put it simply—cares about each step.
Linux hardening
The Linux hardening process is part of device provisioning. We’ve configured OS, hardened it, removed unused packages from the system, created and configured LUKS partitions, designed key lifecycle, implemented throughout logging and monitoring, anti brute force measures, restricted access, and honeypots to raise the bar for tampering.
The goal is to have a device that resists reverse engineering, works correctly day-by-day, and is easy to maintain by the [REDACTED] engineering team.
Fleet management
The Hive communicates with their Queen only when the internet connection is present. The main purpose of fleet management is to always know the latest status of devices, have access to their last logs, and react on unauthorised usage.
Application security
Application security measures cover device and fleet management software. The primary purpose of appsec in this project is to reduce the attack surface and ensure that access control measures cannot be easily bypassed.
Except for dozens of typical appsec measures (see OWASP ASVS), we focused on the security of downloading new ML models (to prevent tampering) and interactions with human operators (to prevent abuse).
Authentication
Operating the Hive should be protected by multi-factor authentication: having physical access to the device alone should not be enough to use it. The users must present other authentication factors: passwords, PIN codes, or USB sticks.
Some events, like data decryption or triggering compromise sequences, are linked to the (in)correct user authentication.
Machine Learning security
ML model protection starts on a backend, where models are trained. Before ML models appear on devices during provisioning or updates, they are encrypted on a backend using unique encryption keys per device.
On the device, each ML model is re-encrypted for storage using separate device storage keys. Thus, reverse engineering one device won’t give access to keys or models used on other devices.
In addition, ML model’s weights are encrypted and obfuscated, making it pointless to intercept, as the model is “distorted” and requires decryption and deobfuscation before usage. This approach “assembles” an actual ML model in device memory right before execution.
Data at rest security
We configured LUKS for data at rest encryption, the partition decrypted only after successful authentication. We added additional application level encryption for application data: stored telemetry data, logs, ML models. To prevent tampering, we’ve added additional integrity checks for all data and signing for the data which is critical from trust / origin perspective.
Encryption and key management
While using a single design, the encryption system uses unique random long keys for each device. The key management scheme is built so that a functional encryption key appears only in device memory for a short time, being split into pieces in different locations.
Hive devices require lightweight cryptography, like AES-SIV, Super ChaCha or BLAKE2, suited for low-power devices. We build a cryptographic layer using a mix of cryptographic functions from Themis and LibSodium. We keep cryptography straightforward, using slightly different parameters on the device and the Queen backend.
We aim to reduce the risks of supply chain attacks, insider threats, and reverse engineering. The issue with IIoT devices is that it’s hard to patch them quickly—thus, the encryption scheme should be resilient and work for years.
Secure communication
Communication security spreads from “just TLS” to mutual authentication, TLS over VPN, and application level encryption of packets with sensitive data. These particular devices communicate over Wi-Fi or cellular networks.
Devices receive control commands and firmware / data updates from the Queen and send telemetry back. It was important to harden the protocol against active / passive MitM (encryption and service authentication), protect from replay attacks (sequences and sessions), and unauthorised “initiate self destroy sequence” commands to avoid bricking devices by a malicious actor.
Reverse engineering protections and self-destruction
We’ve equipped devices with ability to detect tampering and execute a self-destruction mechanism. Detection controls are present on several levels:
- Hardware triggers: someone opens a device box.
- On OS level: honeypots and brute force protections.
- On a software level: obfuscation, debugging detection, and honeypots.
The numerous reverse engineering defences are linked with a single reaction control. If a device notices compromise events, it triggers a self-destruct sequence that wipes and fills with zeros data and models, and sends the “I’m dead” beacon to the Queen.
A compromise event can be also triggered automatically based on device self-health checks, or remotely from the Queen.
Protection against side-channel attacks
Every working device produces enough information about its behaviour: warmth, power usage, connection signals, etc. While it’s complicated to eliminate side-channel attacks, we took specific hardware and software measures to make them less successful.
One of the examples is using constant-time operations and noise during cryptographic computations.
Protecting technology relies on real-world operational controls: we've designed processes for operators to use devices without exposing them to unnecessary risks in the field, because, you know, crop management and soil enrichment are hostile environments in some regions.
Products and services involved
Cryptography engineering
We've designed cryptographic protocol and key management layout for over-the-air updates of firmware and ML models. Cryptography is based on IoT-friendly crypto-primitives, and key management layer is kept lightweight.
Read moreCryptography engineeringSecurity architecture & engineering
We designed & built the whole security layer described above: OS hardening, application security, ML model protection, data security, communication security, and anti-tampering measures.
Read moreSecurity architecture & engineeringThemis
Themis is a cross-platform high-level open-source cryptographic library. We used Themis as a building block for cryptographic protocols, because it has hard-to-misuse API and works the same among multiple platforms.
Read moreThemisResults and outcomes
The fields are properly fertilised, and sunflowers are growing in peace, monitored by the Hive and the Queen. This project confirms that information security is more than a technology function but is a business enabler for some industries.
Now, the [REDACTED] company can securely produce and provision IIoT devices, process field data, and update the firmware and ML models. With a mesh of security controls and reverse engineering protections, the [REDACTED] company is confident that competitors won't be able to steal the IP or abuse devices easily.
Security for innovative industries
Emerging industries don't have established security recipes. We combine years of experience, software, and creative vein to protect innovations. Talk to us if you are looking to take your data security to the next level.