Article updated in 2019.
Previously in the series...
Previously, we’ve talked about design patterns best practices in backend security, then about key management goals and techniques.
It is important to understand that database security evolved with system administration techniques and programming demands, with cryptography and access controls being complementary features, rather than cornerstones.
In classic designs, there are two important drawbacks:
Trust tokens:
they rely on storing trust tokens somewhere inside the infrastructure;
trust tokens barely rely on real-world relationships;
these trust tokens are a large attack surface as they open access to many records at once.
Trusting infrastructure:
all these designs suggest that infrastructure exists, works properly, and is not completely compromised;
some of the classic designs rely on the idea that there is a rough "perimeter" between the inside and the outside world.
This, as it turns out, is not the case anymore*.
Goals of a modern backend security system
Apart from the security practices, today’s application architecture and typical engineering patterns have changed significantly. Also, the level of detail that a modern developer is willing to get into is nothing like it was 10 years ago — the developer expects most of the things to be neatly solved by the existing software and frameworks.
When thinking about database/backend security tools, we generally want:
access control with strong compartmentation: authentication, granular CRUD authorization per user/table, similar to 'grant rights' that exists in databases without encryption;
leakage prevention at rest / in use / in motion;
authenticity and integrity of all data.
When thinking about modern practices, we might add:
The risk model should consider the baseline to be ‘everything will be broken’ threat model:
everything is in the cloud and the cloud itself should have very limited trust,
the database, middleware, API providers, and front-end talk over the open internet,
they don’t have a centralised source of trust,
they don’t share verifiable physical factors,
they can be compromised without having an awareness of other talking parties.
Security instrumentation should easily blend into the data representation:
ORM-friendly,
Prepared statements,
Easy management and entity mapping.
Security functions should include as little cryptographic details as possible to isolate errors and minimise the adoption friction.
Also, we want to sacrifice as little database-specific benefits as possible:
Backups, compaction;
Indexing protected data and searching over it;
Using protected data in SQL statements;
Control protection with flexible granularity — from cell to table.
Solutions to some challenges
Unfortunately, no all-encompassing solutions for the mentioned problems exist. However, for each of the problems and goals of the backend security design, there are numerous components and techniques that we might use.
We can divide the new protection solutions into few classes:
Encryption: searching, indexing, encrypted query databases;
Infrastructure security;
Access control.
Encryption: Searching / Indexing
Searching is a subset of controlling the read access cryptographically: allowing the processes with certain features / keys to read the data without compromising it (to the advantage of) potential attackers, yet preserving the ability to execute various queries on top of it. (We have created a whole scientific paper on secure searchable encryption and you can read it on IACR
SSE, Searchable Symmetric Encryption
A promising approach is to use symmetric encryption for the text, then challenge the database with specially crafted queries. Works with sequential scanning and indexing but is rather limited and is more of a theoretical rather than a practical solution. But there is an implementation to try out and build on:
https://people.eecs.berkeley.edu/~dawnsong/papers/se.pdf (paper);
https://github.com/atulmahind/song-wagner-perrig (implementation);
https://eprint.iacr.org/2006/210.pdf (overview paper).
PEKS, Public Key Encryption Scheme
Public Key Encryption Keyword Search scheme relies on the data owner for generating a number of trust tokens which are used within the ‘vefication’ process. Such a process allows the server to verify whether the chosen keyword is available or not within the encrypted data. Although being slow and currently mostly theoretical, the possible security of this scheme is very interesting.
https://github.com/atulmahind/PEKS (implementation);
Homomorphic encryption
Homomorphic encryption is a method of performing calculations on encrypted information without decrypting it first. There are fully and partially homomorphic encryption schemes, which provide different sets of operations on protected data. Apart from searching, there are many use-cases (like using the data to perform certain calculations), in which homomorphic encryption is extremely useful.
This solution looks like something that belongs in the future, with no practically usable systems today:
Lattice-based encryption has also attracted attention from theoreticians who talk about its "flexibility for realising powerful tools like fully homomorphic encryption". The latest speed reports for fully homomorphic encryption are—let me use precise technical terminology here since I'm a big fan of careful benchmarking—ludicrously slow, but without ideal lattices, they would be utterly ludicrously slow.
(Source: Daniel Bernstein's blog)
Minimal exposure search index
There are more practical approaches, though. You can manually define a list of tokens you’d like to search over, encrypt or hash them, and search accordingly. You can decouple the search IDs and tokens from the actual data before encrypting/hashing them, thus making sure that the known ciphertext attack won’t be useful.
Acra Enterprise provides searchable database encryption
Encrypted query databases
CryptDB
CryptDB is a system that provides practical and provable confidentiality in the face of these attacks for applications backed by SQL databases. A scientific research led by MIT, CryptDB carefully balances various encryption techniques with risks and requires requesting the party to craft a special encrypted query to execute it over the protected data. Although looking quite promising and its adoption by many parties, there are already some known vulnerabilities (https://cs.brown.edu/~seny/pubs/edb.pdf) and weaknesses, which led to creation of ‘how to use CryptDB securely’ guidelines. Although the dispute is yet to be solved, in most cases we can consider CryptDB to be practically applicable for backend data security problems.
Site: http://css.csail.mit.edu/cryptdb;
Github: https://github.com/CryptDB/cryptdb.
Encrypted BigQuery
Inspired by a research, Google has proposed Encrypted BigQuery, experimental BigQuery client, which provides a subset of BigQuery operations in an encrypted fashion:
Client: https://github.com/google/encrypted-bigquery-client.
Tutorial: https://github.com/google/encrypted-bigquery-client/blob/master/tutorial.md.
Cipherbase
Microsoft has suggested its own security system for encrypted queries, Cipherbase, which is the base for Always Encrypted database engine.
Infrastructure security
Trust compartmentation
What would you do if you couldn't control the trust of a large database and/or application cluster? You offload critical procedures to a small service running in a well-controlled environment (and, perhaps, powered by hardware separate from the constantly-loaded database cluster).
HSM
There’s an easy classic way to offload trust — to use a dedicated piece of hardware for performing all cryptographic operations and managing keys. There are cases where such solution might feel efficient, but a typical HSM performance is not helpful when a lot of data is being processed.
HSMs are available for all mainstream commercial databases. With varying level of effort, thay are also integrateable into modern open-source ones.
Integrated security instrumentation
A lot of older database protection techniques rely on a database running in the safe and secure environment: e.g. trusting the system you run your code on. This is a place for traditional security instrumentation: Host IDS (like Samhain), Mandatory Access Control (like SELinux), and others.
Access control
Most existing database encryption techniques enforce only the read control, preventing the risk of data exposure through requiring a key to access the encrypted data. Some of them verify the authenticity of protected records, thus providing protection against tampering, but we are not aware of existing schemes with write control (apart from the ones we’ve developed ourselves, but more about this later). Apart from read control, the rest is enforced by typical ACL/grant techniques, which rely on trusting that database behaviour is not compromised by an attacker.
Inside of the previously discussed threat model, we want as little trust put into backend as possible. This means enforcing access control via non-database techniques (e.g. encryption) and making sure that except for legitimate consumers, the data ‘in process’ is decrypted as little as possible (if it gets decrypted at all).
Cossack Labs research
At Cossack Labs, we strive to see the problems described in this article in a very different light.
First of all, we believe that an old UNIX proverb of “do 1 thing really well, get inputs and outputs in standardised fashion” doesn’t work well for the modern security at the present day, because in this case:
developers are still held responsible for making security decisions, including key layout and encryption granularity;
typical solution suggests stripping several security tools together in one backend infrastructure, which means more chances to break things on integration;
such work requires high-level vision, which is rarely present.
We strive to address these problems differently: by providing specialised tools for specific use cases, which abstract all cryptographic decisions into more user-friendly concepts.
Acra: crypto compartmentation via transparent database encryption
Acra database security suite is our take on compartmenting trust via a transparent architecture: making sure that the attack surface is very small and is contained within a well-controlled environment. A daemon is running on a separate virtual machine, receiving all database queries, executing them, then decrypting the data and supplying it back to the application via a protected channel. Acra’s encryption scheme is built in a such a way that the application is able to write data with a small number of cryptographic tokens, which is insufficient for decryption of data.
Hermes: granular access
Hermes is our research of a much more ambitious problem: enforcing all the CRUD grant rights via cryptography and providing an infrastructure for building complete end-to-end apps, which rely on cryptography for the implementation of all of their security mechanisms. This is ongoing research, with new implementations and ecosystems being built right now. We’ve presented the proof-of-concept of Hermes with practical sample code and scientific paper in December 2017.
Ending notes
There are many techniques for protecting data stored within database / application backend. Intuitively it feels that through combining a few tools here and there we might achieve some decent level of security. But in reality we need to understand the threat model, how to limit the attack surface and protect it really well. It is a part of application/infrastructure design, not a 'feature', nor a 'service'.
Need better security for your database?
*2019 UPD: This article is just as valid as on the day it was published. If you're looking for new security-related ideas, this is the right place. If you're looking to implement security, apply for our Customer Success Program or we can train you. If you're looking for ready-made solutions, consider looking into Themis, Acra, or Hermes.