In the previous blog we talked about the guarantees that we can provide when a user uses our confidential computing platform avato. We touched upon how these guarantees are provided on a high level, and what this means for the data analytics industry.
In this blog we want to give more context as to what Confidential Computing is and why it’s becoming a new fundamental block in data security. For this we will first make an analogy coming from the recent history of the internet’s security.
A not-so-ancient history of encryption
In the age of the internet and digital data, one of the biggest revolutions in security has been the establishment of Transport Layer Security (TLS). Before TLS, you had to trust the network not to inspect or tamper with your data. The protocol added a whole new layer of safety into the in-transit system (protecting data while it gets moved around). In fact, so much so that Google pushes the internet to exclusively use TLS-secured communication.
Confidential computing does exactly the same thing but for the actual computation. With traditional methods you have to trust the remote server to do the computation you want it to and in the meantime not to inspect your data.
As with TLS becoming ubiquitous, maybe in the future all remote computation will use confidential computing techniques…
The days of HTTP
Examples always help. Let us draw a parallel from HTTPS which is HTTP but secured by TLS.
Before HTTPS when you opened a webpage, for example http://notsafeatall.com, your browser would establish a TCP connection to the remote server and would send an unencrypted HTTP request. Then in turn the server would reply with an unencrypted answer. The problem with this process was that anyone could establish themselves in the middle of this transaction and impersonate a website since they could relatively easily read and modify unencrypted packets in a man-in-the-middle attack. In the days of HTTP, you had to trust the network and the remote server.
The days of HTTPS
In order to fix this, HTTPS was created which essentially makes the same process but encrypts all the data in-transit. No more man-in the middle attack since any man (or woman) in the middle can only see encrypted gibberish and any tampering with the packets can be detected. In the (current) days of HTTPS, you do not have to trust the network, but you have to trust the remote server.
Guess what comes next?
The days of Confidential Computing
Confidential Computing removes the need for trusting the remote server. In traditional computing, even though you know that the data in-transit is safe (because of the type of encryption in the previous example) simply put, you cannot be sure what program is running on the remote server when you don’t have physical access to the CPU and control over the operating system. So even if you know that your data arrives safely at the destination server, you have zero control over what is happening with it there. Of course, in many cases this is alright since you inherently trust the server, or because the data you transmit are not sensitive in nature. But what happens if you want to do computations on data that you don’t trust the server with handling? This is the void that Confidential Computing is filling. You can now be sure that only the computation you intended is running on this data, and no one (not even someone with physical access to the CPU or the OS) can see or alter. This effectively means that in the same way that you no longer need to audit or trust the network (as you needed before HTTPS) you don’t need to trust or audit the computer that does the computation for you since provably nobody can see or change it. With Confidential Computing you can now delegate computation on even the most sensitive data and be sure that the results will be untampered.
We hope to have convinced you that Confidential Computing is a Big Thing. Now we want to dig a bit deeper and explain on a lower level how we at decentriq use Secure Enclave programs leveraging Intel® Software Guard Extensions (Intel® SGX) technology to enable it.
How does it work?
Delegating any sensitive computation while being sure that nothing gets leaked or altered requires specific characteristics from the Secure Enclave. The most advanced type of enclave with the strongest security is based on Intel SGX. Intel SGX increases the security of data in-use through two features: Enclave-memory encryption (aka encryption in-use) and remote attestation.
Enclave-memory encryption or “How to execute code without trusting that no one will see the memory”
In order to process data, a CPUs load data from memory. In traditional computing, these data must be unencrypted in order for the CPU to perform computation on them. With Intel SGX, the CPU can read encrypted data from the memory because a dedicated decryption/encryption chip inside the CPU handles the memory access of Secure Enclave programs. The encryption/decryption is done on the fly within the CPU itself when enclave data or code is leaving/entering the processor package. Consequently, the user does not have to trust any code or process outside the enclave including the operating system. The details of Intel’s Memory Encryption Engine which performs this are outlined here.
This enclave-memory encryption is complemented by enclave-memory access control which prevents all other processes from manipulating memory that is associated with the enclave through another layer of memory address translation which is not controlled by the operating system.
Remote attestation or “How to make sure that all of this is really happening”
Ensuring that nobody can read data from the memory region of a remotely running program is only part of it. Equally important is ensuring that the program is actually what you expect it to be. The verification by a user of a remotely running program’s identity is called remote attestation. In the case of Intel SGX, this means that Intel SGX helps prove to a user that they are communicating with a Secure Enclave program running a specific piece of code. The proof (which we graciously call fatquote) contains three main elements:
1. A hash of the enclave’s identity (program code and data) after initialization (the measurement).
2. A cryptographic signature that certifies the enclave’s genuineness (the fact that it is actually a Secure Enclave program that produced the above hash).
3. A public key that enables the user to establish a secure connection with the enclave (the private key counterpart cannot be accessed outside of the enclave).
Since this part is the part with the most “action” we want to stay a bit more and outline the specific steps that the user has to do when using decentriq’s avato platform in order to actually create an enclave and perform remote attestation
1) The user uses decentriq’s Python API (or the underlying HTTPS REST requests) to authenticate themselves to the platform and request the creation of an enclave.
2) The avato platform launches a process that constructs an enclave
3) The user requests the attestation information (the fatquote)
4) Our software requests the fatquote from the enclave which triggers the following actions
i) A dedicated Intel SGX CPU instruction starts a procedure which produces a quote that contains (1) the enclave's public key and (2) the measurement we introduced above. It is signed by the CPU (actually by another Secure Enclave called the Quoting Enclave, don’t get us started…) with an inaccessible cryptographic key. This signature proves that this actually is a genuine Intel SGX enclave.
ii) The quote is sent over the internet to the Intel Attestation Service (IAS) which inspects the signature (Intel stores the attestation keys created when manufacturing in a root certificate) and upon successful verification will in turn sign over it with a publicly verifiable signature and certificate chain. Now, with this verification (and some metadata), the quote becomes the fatquote. As an equation: fatquote = quote + Intel’s signature = enclave public key + measurement + Intel’s signature.
iii) The avato platform receives the fatquote and sends it to the user
5) The user verifies the fatquote by verifying the signature and certificate (rooted in Intel’s root certificate) and by comparing the measurement to an expected value. This expected measurement value can be either computed by the user (requiring the construction of an identical enclave that is open-sourced) or obtained from a trusted third-party auditor.
After the attestation proof has been verified, the user knows the identity (program code and data) of the remote program and can use the enclave’s public key to encrypt their data which can then be sent securely to the enclave.
We learned that Confidential Computing removes the need for trusting the remote server. We implement Confidential Computing using Secure Enclaves powered by Intel SGX. Secure Enclaves key properties are remote attestation and enclave-memory encryption. But: So what?
The applications of Confidential Computing are endless. You can share your health data with proof that it’s only used for treatment development; you can build data marketplaces where owners can share their valuable datasets for proven purpose and with proof of deletion; you can build VPN services that provably do not log user data… We could go on forever, but these are topics for other blog posts.
At decentriq we believe that such a fundamental change in computing should not be hidden away behind heavy cryptography and software engineering skills. We work hard to abstract away the complexities of the underlying technologies and to enable as many organizations and people as possible to use Confidential Computing and unlock previously impossible applications.