>> We’ll get started. So, this is the session on Blockchains, so welcome to this session on Blockchains. So, we have a bunch of interesting talks in this session. Our first presentation will be on a work that marries traditional databases with Blockchain. The second work would be on exploring how we can use computation verification in the setting of Blockchain. Then finally we have another work that explores how we can use enclaves to kind of achieve better performance and confidentiality in the context of Blockchains. So, the first speaker is going to be Professor Carsten Binnig from Technical University of Darmstadt. He’s also an adjunct professor at Brown University and before that he was an architect at SAP HANA. He’s a recipient of Google Faculty Award and he has a bunch of Best Paper Awards at IEEE, Big Data and VLDB and SIGMOD and he’s been recently exploring the overlap between Blockchains and databases and he’s going to talk about that.
>> Wonderful. Good. Do you hear me? So, the mic should be on. So, today, as Arvind mentioned, I’m talking about a database view on Blockchains. So, my main research is actually, in general, in distributed databases, especially distributed databases together with networks. So, I’m a database person. Just to give you a quick introduction where I come from. So, why I got interested in Blockchains is because people are not using Blockchains for cryptocurrencies only, but they use Blockchains more and more for general shared databases, to write data into from one trusted domain and access it from another trusted domain. The main reason why people are using it as a platform for data sharing is, there are many different reasons. One is because it keeps the history of all transactions. Meaning, if something is wrong in your database, you can go back and still look at what went wrong. In some states in the USA, they even changed the laws that blockhains or the data in Blockchain counts as evidence in court, which is a strong reason if you share data between different trusted domains.
The second one is why it’s interesting is also because no tampering is possible after-the-fact. So, once the transaction is in, you cannot modify the list of transactions you store in your Blockchain. Even more interesting, you don’t need a trusted authority for making all these decisions. So, the machinery and Blockchain is designed in order to give you the two properties, that it stores all transactions and it doesn’t allow tampering without having a trusted authority. So, and you see that it’s actually used as shared databases and there are many applications outside that try actually to use Blockchains already as a shared database. They range from different or cover different domains, not data, just in from health institutions, but also supply chains where you have different entities, the supplier and the manufacturer requiring to access the same database. But also to other use cases where you want to have decentralized copyright management. So, for example, Binded is a copyright management platform for images where you put in the image, it gets a timestamp and thus it has the copyright, and if somebody else at the end wants to upload another similar image, you have to prove that the other image was in the database before.
So, the question is now, are we done now? Is this the end of the talk? So, are existing Blockchain systems, is they are good enough for supporting all the different applications that people have in mind and when they want to use them as a shared database? So, what I’m going to do in the talk is, therefore, I’m first going quickly- I think most of the people might know the Blockchain basics, but I’ll still go quickly over the Blockchain background, some basic terms that I’m not talking about things that are out of context in the rest of the talk, then talk about the challenges why Blockchains, in my eyes, are not sufficient yet to use them as a shared database and then go towards what we’ve been doing in the past year with Blockchain to be developing a shared or Database System on top of Blockchains that mitigate some of the problems that we’ve seen, or that I am going to discuss before.
So, for Blockchains, just the quick overview, I don’t want to spend too much time. So, Blockchains the basis they use, a tamper-proof ledger to store data, which I already mentioned, it’s a list of transactions, it’s append only. The transactions are appended in blocks, so if new data is written, the new data is appended or mined in blocks and the ledger is fully replicated. Here you see already, if something is mined or appended in blocks and it’s fully replicated, there seems to be a lot of inefficiency in the system itself and this is, again, one of the challenges that you tackle when you want to use it as a general-purpose Database System.
So, why it’s so interesting as a shared database, I already mentioned is, if an update happened to the shared database, which is replicated to all the peers in the Blockchain, then everybody or at least the majority depending on what protocol you’re using is to agree on the appends and there are different ways to assure that, nevermind what you are using, if you’re using a public or private Blockchain. That’s what also we will go over in a few seconds. The last bit of piece of information that I’d like to introduce is the term smart contracts, which are nothing else than trusted procedures that are stored in the Blockchain that can modify data. It’s your custom logic that you deploy in a Blockchain. For database person, it would be something like a stored procedure that can be called from outside.
But for smart contracts, the specific thing is that all the participants agree on the same code if you’re on the Blockchain. The second slide, just to get the background right, there are two platforms or ways how Blockchains are deployed. There are public Blockchains, means that everybody, every participant, can join the Blockchain and they usually use more expensive consensus protocols. I won’t go into details for those, but just to make sure that we’re on the same side, there are also private Blockchains, which target a different type of application where not everybody is allowed to join the Blockchain but it’s more limited set, targeted towards a limited set of participants that know each other.
Therefore, they use a bit of less expensive consensus protocols and this already increases the performance a bit when you use Blockchains as a Database System. So, the question is, that’s where I’d actually like to start my talk, what are the challenges if you want to use a Blockchains as a shared Database System? The main challenge, I already mentioned that, is clearly the performance of Blockchains. There was a paper last year on SIGMOD where they analyzed the throughput and the latency of different Blockchain systems.
One is Ethereum, that is pretty well-known, which you can set up as a public but also as a private Blockchain, as well as Hyperledger, which is the IBM version of Blockchain system. But what you see here is that the throughput, for example, for a classical database benchmarks, so if you use the Blockchains as a key-value store, it’s pretty low. So, they reported around, for Ethereum, around 300 transactions per second that can be executed. If you look carefully in the paper, this is not the average transaction rate, that’s the peak transaction rate.
So, the average transaction rate is actually much, much lower. So, clearly here you see that the transaction and the performance that you get, the transaction rate, the throughput and the latency that you get out of these Blockchains is far from what typical applications need today. Even, for example, if you look at what Visa is processing on average, so Visa transactions are around 2,000 transactions per second. If you look into the applications how people want to use Blockchains, they also want to store sensor data in the Blockchain, which even require much higher transaction rates. So, the performance itself is one problem. The second one is, if you today want to use one of the Blockchain systems to set up a shared database, you are confronted with the “zoo” of those systems, and it’s really not clear which one to use. They come, (a) with different programming and execution models, and the question is with which platform are you able to implement your logic that you want to have for the shared database.
Where can you implement? In which can you implement transactions or the transaction logic that is needed for use case. Even worse, because there’s so much development going on, you don’t know what is the best Blockchain for your workload and (b) which ones will survive at the end, because there’s so much movement in the sector and people are trying to come up with optimizations, new platforms and it’s unclear which one you should build your shared database application on. The last one is even more severe, that Blockchains, you might assume if you deploy your code in the Blockchain and run it, you get some kind- because even procedures run through the consensus mechanism that multiple nodes need to execute the same code bases and at the end the output of the procedure is verified, so meaning if all the outputs are equal. But that’s not true for purely read only functions that you would implement in your smart contract. They are only executed by one peer in the Blockchain.
Meaning, if that one peer, if that one node, is malicious and you are asking or you’re sending a read request to that one peer, you might get a wrong value from the peer end. So, the guarantees we are getting for reads, for example, is not enough for many of the shared database where you want to know, is the value I’ve been reading actually the value that is stored in the database? So, there are many guarantees that you actually then want to add to your Database System or to your, if you use your Blockchain as a shared database, and what we think would be nice is if you could execute transactions on a shared Blockchain, at the end verify if the execution of the transaction, meaning if all your read and write operations was correct with regard to the guarantees what the database wants to give you, and if you discover that a violation happened, that you can go back to the last verified checkpoint.
So, the ideas of transactional theory more into implemented in Blockchains, but also transactions that are verifiable. So, how are we going to achieve this? This is the idea of BlockchainDB. The idea is that we built a Middleware on top of Blockchains. The idea is what the BlockchainDB gives us is that it gives us a unified API. So, we tackle the problem of having a sue of those systems by having a unified front-end and an exchangeable back-end, similar to what MySQL is today. For databases, you get a unified front-end and you can plug in your storage engine that you like to have, which gives you the performance characteristics maybe that you want to have for your application. The second one is that we can apply typical database optimizations in the Middleware. So, we’ve seen that the performance itself is a major limitation, and by applying some tricks in the Middleware, by sharding across certain Blockchains, we can achieve much higher throughput than what the existing Blockchains give us already.
The last one is that we want to support this notion of verifiable transactions. So, executing read/write operations via the BlockchainDB, and then at the end getting a verification mechanism that tells us is a transaction that we’ve executed been correct with regard to the consistency level, for example, selected in the Database System? Now you could argue, “Okay, a BlockchainDB, this is the component that runs outside of your Blockchain. So, it’s an untrusted component, doesn’t it make things harder?” I’ll come to that at the end and I’ll talk a bit about how we deal with verification with a new component that is actually untrusted. So, what we’ve actually been building in the last year is not a full Database System, but as a first step, we started to build a Middleware that has a simple put/get interface because this is already from, as a basic building block, we think we can use later on to build transactions on top, and b, it gives us already proof of concept of how far can we get with transaction rates if you have just a simple put/get workload. So, the idea is we have a Blockchain key-value store that supports at the beginning only the simple put/get operations that we have here and we have pluggable back-ends.
Inside the Blockchain, in order to achieve the higher performance and implement some database optimizations, we selected a bunch of techniques that are implemented in the Middleware, be it from sharding, the data into multiple Blockchains, so not using just one. This is a thing that already some of the Blockchains do by now inside the Blockchain system, they support sharding but not all of the systems support it. So, what we can do is just put that on top and support, that way some data-parallel execution across Blockchains because if you look a bit deeper into the execution models of Blockchains they are to be fully serialized, a serialized execution with no parallel execution of transactions at all, and the sharding gives us some way of parallelizing. We implemented a bunch of other optimizations. I want to go just in two of them and show you quickly what the performance and impacts are if you do that or if you implement those in the Middleware.
The first one is, so the question is if you provide a put/get interface, what consistency can be provided if you use a Blockchain? In Blockchain key-value store, we support different notions of consistency levels or clients like consistency levels that are more expensive or less expensive. The idea is that if you give up some of the consistency similar to what normal databases also do, you can get higher performance even when we use a Blockchain as a back-end. The idea here is that if you use a higher consistency level, for example, read your own writes, what we do in the Middleware if a write is coming in, if a put is coming in, it’s first submitted to the Blockchain but we don’t wait for the transaction actually committed, but only that it’s validated.
After that, we just append it to a list in our Middleware, meaning that once a read is coming in for the same key, at that moment we do lazy blocking on the pending transaction and wait for the transaction to be pre-committed before we give back the value. So, this is one technique where you already don’t need to wait for the full execution of a transaction, but you still get a guaranteed consistency level which many of the applications desire. Or you can even go lower and provide only eventual consistency and even don’t wait for the transaction to be committed and execute, just read operations completely on the status you are getting and monitor if these transactions are finally commit, so if you can still guarantee liveliness. What you see here is that if you implement just on using a 50/50 workloads of 50 reads, 50 percent writes, and Ethereum as a back-end. If you want to provide read your own writes semantics, you get only throughput of 20 put/get operations at a second, and this is in line with a block bench numbers.
If you look a bit closer, they reported only numbers as I said for max throughput and not for average throughput. See, the eventual consistency gives you already a boost of a factor of five in terms of transactions which is not still there what many of the application needs, but it’s already a way of getting higher performance out of your Blockchain system if you decrease the consistency level. A second technique that actually is helping much more is that, in our Middleware we implemented some form of batching. So, we don’t submit puts one by one to the Blockchain because the pair of transaction overhead of validating transactions is pretty high. So meaning, if you batch them together, so multiple puts in one Blockchain transaction, you can get, for example, if you block up to 500 puts in one call, the transaction rate of 1,200 transactions per second. After that, the maximum message size we used here four byte integers is exceeded, so that’s the limit of batching that you can use. So, the plain database tricks is one thing, but the question is now, I mentioned a third point which is important, that is getting verifiability of your operations.
This is something that we also implemented inside Blockchain or inside our system. The idea of verifiable consistency is that a client can verify the correctness of all operations, meaning if the puts and gets that he executed that he sees or that he executed adhere to the selected consistency level. So, what does it mean for eventual consistency? It means for the eventual consistency that the client doesn’t see any fake reads. So, if you think about a scenario where, for example, the Blockchain Middleware or one of the peers is malicious, he might return just the value that is not stored in your database and we want to be able to detect it after the fact. If you use a Blockchain system out of the box, you don’t get that guarantee. The second one is that we want to, for example, our eventual consistency, we will also want to support life in there. So, we want to assure that all the puts that I executed through BlockchainDB are actually stored in the Blockchain system and not dropped again by one of the components. So, the problem is that as I mentioned, that all these operations or these guarantees could be violated because, in our setup, we have multiple untrusted components.
One is our Blockchain key value store that we introduced as a Middleware. But there’s also the peers themselves, if you look at them individually, they are untrusted. Meaning, if you just go to one of the peers, we might get the problem of fake reads, and reads are actually executed by only one peer. So, it’s a problem of detecting if you get a fake read or if, for example, the Blockchain Middleware or the peer drops a transaction and doesn’t execute it. How we do it is so here just a high-level overview is the idea is that we use deferred verification mechanism where every put that is executed through the key value store updates the write set in the Blockchain itself, and the clients additionally lock their actions, their reads and writes set, they don’t need to do it for each individual call. But they lock the read and write set not using the Middleware layer, but bypassing the Middleware.
They lock the reads and writes sets into the Blockchain as well. The nice point about these calls is they don’t need to be synchronous blocking, you can just send them asynchronously. At the end of your Epoch when you execute it multiple of the puts and gets, what you want to do is also a verification procedure that looks into the question if the write set was included that the clients reported via their record calls is included in the write set that the key value store was seen.
This is also again a non-blocking operation means we submit that as a Blockchain transactions, so implemented as a store procedure as a smart contract which checks if these two guarantees are satisfied. Once it’s done, we can just fetch the result if the Epoch was correct, if all the operations have been correct. In worst case, rollback in the transaction history to a point where the last verified Epoch was stored in the database.
So, we can also recover to valid checkpoints. Good. So there’s much more, many more details that you need to think about if you want to do verification with Blockchain and there are more details I haven’t been including in the slides, but this is the general notion of how we want to ensure that puts and gets are executed correctly. So what are we up to next? So, as I said, it’s a first step that we implemented.
There are many more optimizations that we haven’t implemented. For example, caching in the Middleware but still guaranteeing verifiability. That’s something we are working on at the moment. Then for sure go in the route of having not just the put/get interface but supporting full database transactions on top of our key value store interface. So, in long-term, what we want to achieve or how we think that our Middleware could be integrated into a data management stack is also have a combination of a database that is running on-premise, where you use the Blockchain Database System to integrate something like a Shared column into your local Database system. Meaning, you create a table, just mark your column as “Shared”, and by getting the guarantees of verifiable transactions and sharing on those columns of data is automatically written to the Blockchain, and you can verify for those columns if transactions on those columns have been right.
But this is something we haven’t started thinking about in detail. With that, I’m at the end of the talk, happy to take questions. I’m sure we have time for questions at all. Yeah. >> So, you cited a sigma paper saying that even if you set up a private environment around Blockchain, the throughput is very limited, right? >> Yeah. >> So, intuitively I have hard time to understand why this case and I wonder if you can share some insight. >> You mean why the transaction rate of Blockchains. So, I haven’t been going into the details of the execution models of Blockchains. But for example, public Blockchain use a consensus protocol which is called “Proof of Work”. Meaning, that one was private, yeah. For them, as I said so, the complete transact for private Blockchains, the problem is that, the execution of all transactions is fully serialized. Meaning one node takes a set of transactions, executes them, distributes the results to all the peers in the Blockchain and they re-execute the transactions to verify the outcome of the transactions.
Then, if different peers agrees or different peers or the majority agrees, the block is appended. Which is inherently, so full replicated Database System that doesn’t allow any concurrency and the results that you are seeing is and there are many more overheads in Blockchains, which is not just the execution but also verifying the signatures for each and every transactions, etcetera. Yeah. Mike. >> I’m just curious if you could give us a feel. I know you didn’t go into the details of Blockchain Protocols but is there something fundamentally different with using Blockchain as the backend for the key value store as opposed to if you were just building a Federated Database over a bunch of slow Database Systems where you would be faced with a lot of the same challenges? >> There are two answers from my side to your question. The first one is, why don’t you use a Database System or replicate the Database System from the start for those use cases. But that’s a valid question. So, the answer is, for me, people are using Blockchains nowadays and it has this legal notion that data that is stored in a Blockchain can be used as evidence for court.
So, there are already some mechanisms in place that you don’t have a database. There are reasons why people should use a Blockchain. For some use cases, it’s not possible to start with a Database System as a starting point. The second one is, regarding the performance, I think that we saw already from the talk from Microsoft in the morning, there’s a lot going on in optimizing the performance of Blockchain Systems themselves. We just benefit from it. Whatever comes next as a platform, we can plug it in as a backend and use it and provide a verification guarantees on top. This is what, let’s say the value of what we provide is. So, you get the interface of a typical Database, you can verify your transaction.
So put get operations as if it would be a Database but at the end, it’s a Blockchain. So, we provide a stable interface you can program against it from your application and get the advantages from your Blockchain Systems. >> Assume we moved from ‘Proof of Work” to “Proof of Stake” and assume every block can pack tens of transactions. Where would you imagine would be the next bottleneck, in terms of improving the performance of Blockchain transaction processing? >> We haven’t been ourselves not looking too much into where the bottlenecks of current systems are. There two good papers that analyze actually the performance bottlenecks and where are the limits of how far could we push the performance if we tweak the parameters. There’s one from a bunch of authors from ETH, Cornell, et cetera, where they theoretically analyzed where is the upper bound with the current protocols.
I think that’s a nice read. Anyway, the second one is Blockbench, where we’re running already against different systems with different consensus protocol workloads. From my perspective, what I think Blockchain should do is, the computational or the execution of transactions is inherently inefficient. As I said, it’s completely sequentialized. There’s no notion of concurrency, and I think Blockchains can learn a lot from what databases have been doing in concurrency. So, executing transactions as if they would be serialized. See, if they would be serial but having actually a concurrent execution and if you can get some ideas of those into the next generation of Blockchain Systems, it would increase the performance and reduce the bottleneck of what current Blockchains have. >> All right. I think we are running slightly behind scheduled. If you have further questions, we might want to do a short break. So, we’ll thank the speaker. Yes. >> Thanks. >> So, the next speaker in this session is Srinath. So, Srinath as a researcher at Microsoft Research at Redmond labs and Srinath is an expert on having periodic systems around verifying of computation, around computation verification.
In this talk, he’s going to talk about a system called Spice, which is about verifiable state machine and this work is going to appear at OSDA this year. >> Can you hear me? Think it’s working. Okay, thanks. So this talk is about the powerful primitive card Verifiable state machines. The system that implements this primitive is called Spice. This primitive as many applications in the context stuff for trustworthy Cloud Services as well as de-centralized Blockchains. I’ll start by defining what this primitive is. A Verfiiable state machine is a primitive in which there are three types of entities; a set of clients, a statefull service, and a set of verifiers. These clients issue a st of request to the service and get back a set of responses. Then the service periodically publishes a trace which contains an entry for each request response pair.
Then each of these verifiers run instead set of local checks and then output, accept or reject. Then we call such a primitive as a Verifiable state machine if it satisfies these properties. First, if the service behaves, that is the execution is equivalent to some serial execution of the concurrent requests, then the verifier is output accept. Second, if the service misbehaves with execution of a request or with concurrency control mechanisms, then the probability that a verifier outputs accept should be less than epsilon, where epsilon is very small. Third, the trace itself is zero knowledge, that is it does not reveal anything about the requests or the responses or the internal state of the service beyond the validity of the correct request execution. Finally, we require each entry in this trace to be small, for example, a few 100 bytes.
It turns out, there’s a lot of prior work that implicitly implements this Verifiable State machine abstraction. They do not use the state machine formalism but it’s not fundamental. The theory actually dates back to early 90’s. But this theory was too expensive to be implemented. How does a lot of work that reduces the cost of this theory by all 20 orders of magnitude? These latter systems, supplied support for stateful computations in those earlier systems. For example, they support storage interfaces such as key-value stores and even a limited form of SQL Databases. Despite this massive progress, all of these systems have two key limitations.
First, the storage operations in these systems are very expensive. For example, it would take tens of seconds or even several minutes for each storage operation. Second limitation is that they only support a sequential model of execution for expressing these services. As a result of these two limitations, prior work can only support very limited throughput even for very simple services. Our system, which is called Spice addresses these limitations. First, it features a new storage primitive that is two orders of magnitude more efficient than prior instantiations. Second, it supports a concurrent model of execution for expressing services.
As a result of these two new techniques, Spice can support thousands of transactions per second for many realistic applications. In the rest of this talk, I’m going to focus on three things. First, I’ll discuss a few applications that can be built with verifiable state machines. Second, I’ll present some background on prior work and a quick overview of Spice. Finally, I’ll present some experimental results. At a very high level, we are interested in this primitive verifiable state machines for two reasons. First, it enables us to build cloud services in which we do not have to trust the Cloud infrastructure.
Second, it also enables private and efficient Blockchains both in the permission membership model and permission-less membership model. I’ll start with the first application scenario. The first application is a Cloud-hosted ledger inspired by Sequence. Here, the service maintains balances of assets owned by different clients. It exposes three types of requests. First, the Issue operation enables an issuer which is a special entity in the system to issue some assets to a client. The second operation is a Transfer that allows one client to transfer an asset to a different client. Finally, the last operation which is called Retire allows a client to take an asset out of the system. So in the status quo, if somebody wants to verify the correct execution of this service, they have to get a complete trace of all the requests, responses, and the internal state of the service.
An auditor can verify the correct execution of the service without access to any of the requests or the responses. It does not have to trust the infrastructure on which the services is running. The second application is in the context of four decentralized Blockchain such as Ethereum. On Ethereum, a smart contract is essentially a state machine where two or more counter-parties can issue transactions to create state transitions. So in the status quo, all the transactions are actually stored on the Blockchain. So this provides no confidentiality guarantees. Anyone in the world can inspect the internal state of the contract or also look at all the state transitions a contract has gone through. Second, every request from the app must be processed by the Blockchain.
So this limits the application level throughput that you can get by storing a smart contract on a public Blockchain. But as with verifiable state machine, you don’t actually have to execute those smart contracts on the Blockchain. You can execute them outside the Blockchain and then process only a succinct zero-knowledge trace on the Blockchain. Because this succinct trace does not reveal anything about the internal state of the requests or the responses, it provides very strong confidentiality guarantees. Because Ethereum will only process a succinct trace, you can actually support very high throughput for your applications which is independent of the throughput supported by Ethereum. So I hope that I convinced you that verifiable state machines are actually useful. So in the next part of this talk, I’m going to provide some background on a prior system called Pantry before I provide an overview of Spice. So, Pantry itself extends to prior systems called Zaatar and Pinocchio to support a notion of state.
All these three systems are composed of a front-end and a back-end. This front-end translates a C program into a set of algebraic constraints. Then the back-end implements an argument protocol to prove the correct execution of the C program. I’m not going to go into the details of how these components work, but I’ll just observe that both Zaatar and Pinocchio support a large subset of C. This, in turn, enables verifying the execution of stateless programs. Then Pantry supports state while working in the same stateless computation model by leveraging cryptographic hashes. I’m going to briefly tell you how this is done. So the key idea in Pantry is to name data blocks using a short cryptographic digest of those data blocks. Here’s a very small C program that illustrates the key idea in Pantry. It takes as input a digest and then the prover of the service can actually supply state.
Of course, because the service is entrusted, it can support a supply any state to this program. But there is an assert statement that checks if the digest supplied by the Verifier equals the cryptographic hash of the block supplied by the service. If the service supplies the correct state, just check passes; otherwise, this check fails. By using this idea, Pantry builds a key-value store. The core idea is to treat hashes of data as pointers to such data. Then you can build a tree and which in turn can be used to build a key value store. But the fundamental problem in this approach is that cost of a key-value store operation is logarithmic in the size of the state. Completely, this means it takes several minutes of CPU time even with the million key-value pairs in your state.
Spice addresses these performance problems. The core idea in Spice is we use a set data structure instead of a tree data structure. We then map key-value store operations to operations on a set. As a result, the cost of a storage operation is constant time instead of logarithmic on an amortized basis. However, if we instantiate this idea naively, it’s actually going to be more expensive than the tree-based approach, constraints involved and in the case of many data sizes that we are interested in. However, Spice solves this problem. It does this by using an efficient instantiation of the set-based data structure using elliptic curve cryptography. Spice also includes new techniques to execute these state transitions or transactions inexpensively in this model. I’m not going to go into the details of how all of this is done, but I’ll just present a few implementation details and how the system looks. We implement Spice on top of Pantry. We also wrote three applications using Spice, but Spice includes a toolchain using, but you can build many more applications. The toolchain takes as input a C program which expresses a request handler and then it outputs two executables, one for the service and one for the verifier.
This request handler can include arbitrary C or can also call into Spice’s storage APIs. The storage primitive itself exposes many APIs. First it provides you operations and key-value stores such as get and put. Second, it allows mutual exclusion of APIs such as lock and unlock, so you can lock a key, perform an update, and then unlock. It also provides a restricted form of transactions. So, you can call begin transactions on a set of keys, it returns the current state of those keys. Then you can do arbitrary computation and then you can call end transactions with the values that you want to write back. All of this update happens atomically.
In the next part of this talk, I’m going to present some experimental results. There are two questions that I’m interested in. First, how does Spice compare with prior state-of-the-art? Second, what is the end-to-end performance of applications built with Spice? To answer these questions, we run a set of experiments on Azure Cluster. So for the first question, we measure the throughput of key-value store operations under Spice as well as a set of baseline systems. We preload the key-value store with a million key-value pairs and then we measure the throughput of the system for get input operations separately. As against C, due to new techniques in Spice, Spice can achieve up to four orders of magnitude speed up compared to the baseline systems.
So to answer the second question, we implemented three applications and this graph depicts end-to-end performance with varying number of CPU cores. In the x-axis, you can see we have depicted various requests types such as issue, transfer and retire, that are supported by these applications. As we can see, Spice achieves a near linear speedup and throughput as we provide more CPU cores. I did not depict the verifier’s throughput, but it’s much more than the prover’s throughput of the services throughput which is 15 million proof verifications per second on the same cluster. So to summarize this talk, we believe verifiable state machines is a key tool in the context of both Cloud computing and decentralized Blockchains. Spice represents a substantial progress towards efficiently implementing verifiable state machines. Finally, we are excited about the many possibilities that this work points to. With that, I can take questions. Someone has a question. I’m sorry, I kind of lost the connection between spice and the Blockchain. So, where exactly, are you using spices’ technology to implement Blockchains or is it a front end to Blockchains because you’re using a code generator? So, I don’t know exactly where the connection is.
>> There are two applications that I guess I can go back. Here is scenario that I presented in the context of Blockchain. So history of running the entire smart contract on the Blockchain, and that executing the contract on every node in the Blockchain system, you could run it once and then you could just process that succinct zero knowledge trace on the Blockchain. So, the entire Blockchain network would just verify that there was a valid state transition, and it does not have to re-execute any transactions and they also don’t need to know what the contract was or what this contract was doing.
So it provides confidentiality and also scalability. >> So, you are basing like a front end to the Blockchain? >> Yes. >> So, you don’t need to use a Blockchain in the back and you could put in a database. >> Sure. Yeah. >> If I did it, the Blockchain is actually not relevant. >> Yes. So, here, only the verifier is running the Blockchain but you could run that verifier anywhere. >> Right. >> Even the service itself could be running outside the Blockchain using traditional databases. >> All right. Okay. Thank you. >> The trace includes all the hashes of the state. Yes. >> Can you return to the slide that gave the big picture of requests going into the Blockchain and the trace going to the verifies? >> Is it this slide? >> Yes.
Could you say what information the verifiers have access to in order to make the decision? This picture depicts only an append-only trace. Are they aware of other information? >> Yes, they do know the specification of the service. For example, what means for taking a state transition, so they know the code that represents the state transition. They also know some encoding of the requests and the encoding of the responses. So, this trace has an entry for each request response pair. >> So the verifiers are aware of something about the requests. >> They don’t know the actual plaintext contents of the request, but they have a succinct encoding which is a commitment to those requests. >> Thanks. >> Quick question while we’re on this slide. So, you defined soundness in terms of some epsilon. So what should I think of epsilon as being like the Internet is very large, and there are many requests? >> Yes.
>> So, how often in practice would you expect soundness to be violated?? >> Yeah. So, this epsilon is one and two power 128, so you can think of it as 128 bit security. So, it should be very small like you should absolutely not see verified accepting some incorrect statement as correct, except with that one and two power 128, which is the security of many crypto primitives that we use. Thank you. >> So, the final presentation in the session is by Professor, Dawn Song from UC Berkeley. So, she has a whole bunch of works. So she is a MacArthur fellow, she’s associates at Guggenheim fellow and MIT TR-35 award, and she has like several best papers in security and learning conferences, and she also happens to be the co-founder of a startup in Blockchain called Oasis labs, and I believe that some of the work she’ll be talking about is based on, is related to the startup also.
>> Okay. Great. Hello. Hi everyone. So, the mic is working. Okay. Great. Thanks a lot for being here, I am Dawn Song, I am a professor in computer science at UC Berkeley, and also I’m a founder and CEO of Oasis Labs. Today, I’ll talk about privacy preserving smart contracts as scale and new Blockchain platform, that we’re building. Okay. So, right. I know data is the new oil, and all this data can really help us with the value of data analytics and machine learning, and help us get insights in many different domains including healthcare, financial services and IoT.
A lot of data is also sensitive and and we actually and facing allows us big problems in the data domain today. For example, data breaches at the common place now, for example Equifax, one of the largest credit card companies had the recent data breaches where attackers were able to steal sensitive information about more than 140 million users. On the other hand, a lot of valuable data is also being siloed because it is sensitive. They could really be utilized, and hence a lot of data were not able to extract valuable insights from all this data including for example medical data and financial data and so on. Also, at the same time, users are losing control of their data as demonstrated in the recent Cambridge Analytica incident. So, on the other hand, we are seeing that Blockchain is providing a transformative technology and aiming to solve a number of problems.
Blockchains aims to provide openness and transparency, and allow us to not to rely on anything to party, and provides automatic enforcement of agreements. So, so far, what has Blockchains brought us? It helps with payments, with ICOs and cryptokitties. So, the question is in the future we hope that Blockchain can do even greater things, can help us to revolutionize many different segments in industry including financial services, healthcare, IoT and many other domains. So, today I wanted to talk about some new technology that we have been developing that actually helps bringing the Blockchain technology to a broader domain to enable these other more advanced applications than the ones that we have seen today, and also at the same time help solve some of these data security and privacy problems that I just mentioned at the beginning.
So, first, let me give you one motivating example how in such example, how Blockchain help solve the problem and also where we need the new technologies in Blockchain. So, this motivating example is in the area of fraud detection. So for example, for banks, they need to do fraud detection to figure out whether they should give someone along or with that suspicious and malicious activities. So, typically, high works is that each bank using its information can change its own fraud detector, and because the data is sensitive and usually it is difficult to find different banks to collaborate together. However, as we all know, the effectiveness of these tab of models, and it really depends on the amount of data that the model is trained on or the broad views that the model actually had seen. Hence, it would have been really nice if we can develop a technology that allows these different banks to collaborate together, and using the data from each of the banks together to develop a better machinery model for fraud detector.
Unfortunately, today, we don’t have a technology to enable the banks to do this because of privacy concerns, regulatory risks, and the misaligned incentives. >> So, this is an example where actually Blockchain can help and in particular, a new type of a Blockchain technology such as the ones that we’re developing can help. We can actually develop smart contract, in this case, a fraud detector smart contract, that runs on top of the Blockchain. In this case, the different banks, and each of them can actually contribute data to this smart contracts, and that together using the data from these different sources, the fraud detector smart contract can change fraud detection model. Hopefully, this fraud detection model that change using data from different banks can be much more effective and can really help each bank to deal with the fraud much better.
However, in order to enable an application like this, we need to solve a number of challenges and issues. So, first, in a typical Blockchain, actually, our data in compute on the Blockchain is public. In this case, we are dealing with very sensitive data, essentially customers’ transactions. So, we need to handle the challenge of having sensitive data on the Blockchain worker notes, essentially protecting the computation process of this much contract execution from leaking sensitive information. Secondly, the computer fraud detection model itself can potentially leak sensitive information about the inputs because the machine in the model here that is changed from sensitive inputs to start with, and I’ll show a little bit later that our work has shown that even when you turn machinery model, I feel I’m now careful with the privacy protection than this two-dimensional boundary itself can actually leak sensitive information about the inputs.
Finally, to this Blockchain has a huge scalability issues and to this Blockchain has poor performance and very high costs. So again, in order to enable an application like this, we really need to be able to scale up the Blockchain and scalability to enable a real-world applications like this, and ideally at the class scale event. >> Question. What about the smart contract that the fraud detection model? Isn’t that private? Isn’t that important too? >> So, that’s why I will talk about this. Yes. So, we’re actually going to develop a privacy preserving smart contracts that can help save others. So, in order to enable application like this and address the challenges that I just mentioned, an Oasis where developing a new Blockchain technology called privacy, one of the key about Blockchain platform is Privacy-preserving Smart Contracts at Scale. So first, one of the key primitive of our platform is a Privacy-preserving Smart Contracts. The premise of Privacy-preserving Smart Contracts satisfies a number of unique properties and capabilities. It enables automatic enforcement of codified privacy requirements.
So, the privacy requirements here as coded into the smart contracts. We can enforce this without relying on any central party and it’s designed to scale to real-world applications including heavy workloads such as machine learning. We design the technology to make it easy to use and to make it much easier for Developers to build the privacy-preserving applications without needing to be a privacy experts.
In order to enable this Privacy-preserving Smart Contract at Scale, we developed a number of different technologies and combined together to build our Oasis Blockchain platform. The Oasis Blockchain platform essentially has, you can do to it’s two main parts, there’s the platform component and there is the application components where the smart contracts is the abstraction at the application level. So, at the platform level, so first, we build confidentiality-preserving smart contract execution to protect the smart contract execution process from leaking sensitive information about the inputs. At the application level within the smart contract, we provide capabilities for privacy-preserving analytics and machine learning to make it easy for Developers to do analytics and machine learning without needing to worry about privacy, and to enforce the desire to privacy policies.
At the platform level, we also developed a new architecture for the Blockchain to enable scalable smart contract execution in particular for scalability for complex smart contract execution. So, now, let me talk about each component technology separately. So, first, let me talk about the confidentiality-preserving smart contract execution. So typically, again, as I mentioned, so most Blockchain today, they data and compute our public on the Blockchain, hence they can only have very limited used cases. So, in this case, the inputs to the smart contracts is public, and the smart contracts essentially then takes the inputs, and does a computation, and performs a state transition. Also, the states of the smart contracts, in most of today’s Blockchain platform is also public. So, well, confidentiality-preserving smart contracts execution essentially here, everything’s encrypted. So, the data is encrypted and also the state is encrypted as well.
The smart contract essentially and that’s a computation on encrypted data, and to perform the state transition. Also, at the same time, we want to ensure that even though the data in states are encrypted, we still want to ensure that we can have a proof of correctness that this state transition is correct. >> So, the question is how we actually enable this practical way of doing computation over encrypted data and in particular, this confidentiality preserving execution. So, this essentially, we do this by leveraging a combination of different techniques for secure computation. For insecure computation, essentially, we have two main types of approaches. The first one is secure hardware and the advantage of secure hardware is that it’s a highly performance.
Its performance is close to native computation, and it can support for general purpose computation. The challenge for the other types of approach is crypto based techniques such as secure multi-party computation and fully homomorphic encryption, so on. The challenge for this type of approach is that the preference overheads can be very high. Oftentimes, its orders of magnitude higher than native computation, and hence, it can only be used for very limited use cases. In the Oasis Blockchain platform, we actually use a combination of these different techniques and depending on the actual use cases of the smart contracts, we use a different combinations of these techniques to enable confidentiality presuming execution. So now, let me just briefly say a few words about the secure hardware aspects. So, the secure hardware here, one obstruction is called just the execution environments and also here we call it the Secure Enclave. The idea of Secure Enclave is that, here we can run a program, in this case, a smart contracts inside the Secure Enclave, and then in this case, the operating system and the application that runs outside of the Secure Enclave will now be able to tamper with what’s around the inside, and also will not be able to see what’s running inside, and hence, the Secure Enclave helps to provide integrity and confidentiality of the execution.
Also, at the same time, the hardware provides a hardware-based mechanism for remote attestation such that the remote to verify can remotely verify what has been run inside the Secure Enclave and its initial states. So, we have developed a research project called Ekiden, leveraging approaches like this and so we provide security proof using universal composability, and that we have developed a number of sample applications, including tiny machinery models in healthcare domain and smart building domain, and many other applications. So here, one of the key capabilities that were used to enable confidentiality presuming smart contract execution as leveraging the capability of Secure Enclaves, and in fact, the Secure Enclaves can serve as a cornerstone security primitive even beyond the application domain of Blockchains.
Secure Enclaves can provide strong security capabilities and can serve as a platform for building new security applications that couldn’t be built otherwise for the same practical performance. Over the years, so all the hardware manufacturers have recognized the importance of building secure hardware and they all have built different solutions, but however, we still have huge challenges in building secure hardware. For example, how secure can it be under what threat models? What would you entrust with the secure hardware? The ultimate challenge is, can we actually create a truly trustworthy Secure Enclave as a cornerstone security primitive that can be widely deployed and enable secure systems to be built on top. If we can do this, then we can truly usher the whole community into a new secure computation era. So, what’s a path towards getting trustworthy Secure Enclave. So, first we need to have a open source design. Open source design provides transparency that’s needed for the whole community together to analyze and verify the security and correctness of the Secure Enclave, and that can enable much higher assurance. Also, open-source design helps build a community. Also ideally, we would like to provide a formal verification of the Secure Enclave and we want to ensure a secure supply chain management for securing the manufacturing process.
So, towards this goal, we are doing a project called the Keystone Enclave, aiming to build open-source Secure Enclave on top of RISC-V, TEE, which is an open-source risk architecture that’s available today, and that we provide strong memory isolation, and many other properties. I would have called to enable open-source Secure Enclave that can really be deployed and be used and the manufacturers by any manufacturer and be used in the real world. RISC-V actually has been widely adopted in industry. The RISC-V Foundation has more than a 100 members and you can find out more information about Keystone at our website keystone-enclave.org, and we have a number of goals encased in the projects to help us to finally achieve the goal of building an open-source Secure Enclave.
This is a collaboration between Berkeley and MIT and with the other institutions joining, and we plan to have a first release this fall on both with deployed on FPGA, as well as, actually on the demo chip HiFive Unleashed the RISC-V chip. So, that’s the first component technology, confidentiality-preservings smart contract execution, and then also, quickly talk about the second component technology for privacy-preserving analytics and machine learning. So, while we do data analytics and machine learning, there’s a lot of privacy risks. So, for example here, there are two types of questions, one is how many trips were taken in New York City last year? And the other one’s, how many trips did Joe take last week? So, as you can see you, one question reflects a trend and the other one actually leaks sensitive information about the individual. But, to answer both type of queries, you actually both need to actually have access to data, and hence, this is an example showing that just having access control is insufficient to be able to protect privacy and at the same time being able to answer queries like this.
There has also been proposals using Data Anonymization technologies to solve this problem, but that’s also insufficient Data Anonymization can reduce the utility of data and also at the same time, it provides insufficient privacy protection. There’s a research showing that anonymized dataset actually, one is combined with the other publicly available datasets can be used to re-identify users and hence, leaking sensitive information about individuals. We’ve also done the recent work showing that when you want to train a deep learning model, you actually really need to be very, very careful.
So, the question here is, do neural networks actually remember training data? If yes, can attackers actually extract secrets in the training data from just simply querying the learned models. So, in collaboration with researchers from Google in our recent paper, here is an example of one of the key studies which show that while we train a language model on the email dataset, in this case, it’s a public Enron dataset. The Enron email dataset naturally contains credit card numbers and social security numbers of actual users and we demonstrated that attackers can actually craft attacks to them by just quering the language model to be able to extract the credit card numbers and social security numbers from the original training datasets.
This is an example demonstrating that even when you are training a machine learning model, you have to be really careful to take the correct measure to protect users’ privacy. So luckily, here we actually have a good solution, and so we are training a neural deep learning model, instead if we change a differentially privates deep learning model that, in this case, both from our proposed measure, as well as, an attack evaluation, we demonstrate that by tuning a differentially private deep learning model, we can provide much stronger privacy protection for the users.
>> So, I’m running out of time. So, I won’t have time to actually go into the details about differential privacy. So, differential privacy at a high level is essentially a formal privacy notion that helps to measure for algorithm and the editor analytics algorithm, or machine learning algorithm whether it can distinguish whether a particular user’s data has been used in producing the data analytics results or tuning the machine learning model. So, differential privacy provides a very strong notion for protecting user’s privacy in data analytics and machine learning. There has been very limited real-world use of differential privacy including Google and Apple use differential privacy in very limited setting. There’s no previous real-world’s deployments of differential privacy for general-purpose analytics. There are a number of challenges for deploying differential privacy in practice in the real world including usability for non-experts, broad support for analytics queries in machine learning and also easy integration with existing data environments.
There’s no previous system that addresses all these issues. In collaboration with Uber we have developed new systems to address these challenges and in our system Chorus and Optio, we essentially develop techniques and tools to automatically rewrite data analytics and machine learning pipelines to enforce differential privacy as well as other desired privacy requirements. Some of our technology has already been deployed at Uber for protecting privacy in their internal data analytics.
So, putting together confidentiality preserving smart contract execution and privacy preserving data analytics and machine learning. So, this enables the privacy preserving smart contracts. Now, let me just briefly talk about the third component technology as scalable smart contract execution. Again, I don’t have a lot of time left so let me just go over this very quickly. So, typically when people talk about Blockchain scalability, one thing that people immediately think about is Blockchain sharding. It’s like you have one lane of cars but now you have many lanes in parallel to improve the throughput of the traffic. However, this type of approach is good for high-throughput of simple transactions but however, for the Blockchain applications that we want to build these are much more complex and so the difference smart contracts can also depend on each other. So, to enable scalability in particular scalability for complex smart contract execution, we have designed and developed a different Blockchain architecture than previous Blockchain architectures.
In particular by separating out execution from consensus. By decoupling the different functions that a Blockchain is providing. So, our design is inspired by a few key observations of the limitations and challenges in today’s Blockchain approaches. So first, a Blockchain platform essentially provides three main functionalities consensus, storage and compute. Our functions in most of today’s Blockchain platforms or the three functionalities are coupled together and hence makes it really difficult to scale and also hear consensus as the slow operation and is often the key bottleneck.
When we do sharding, coordination also is expensive and can limit the scalability as well. So, based on these observations and to address these challenges we propose a new Blockchain architecture for what we call separating execution from consensus. Where we actually decouple these different functionalities of a Blockchain platform, where we separate out the consensus layer, the storage layer and the compute layer. Where the computer executes the smart contracts and the storage layer persist the state and the consensus layer agrees on the state transitions. So, I think here there’s limited time so I probably won’t have time to go into the details here. So, essentially to separate execution from a consensus we- so again the compute layer does the computation and in this case actually the compute nodes are stateless so no consensus protocols are necessary and they can optimistically execute the transactions. In this case the failures have no effect on the state, and the states are persistence to the storage layer where the blocks are persisted and replicated and again no consensus protocol are necessary and writes in this case are idempotent. Finally, the state transitions are committed to the consensus layer and here we use verifiable computation to verify the correctness of the computation and the consensus layer essentially tracks the state transitions and the transaction ordering.
Then we use different types of techniques for verifiable computing including majority votes, and a new approach that we developed and discrepancy detection and we can also use other methods such as zero-knowledge proofs for verifiable computing. So, putting all these technologies together, we enable privacy preserving smart contracts as scale. Here is one of the applications that we have been building, together with other collaborators from ETH and doctors from Stanford as well as app developers.
So, at Oasis Labs, we’re building privacy-first, high performance Cloud computing platform on Blockchain. New computing paradigm and computing platform that has a unique combination of set-up properties, privacy protection, scalability, as well as, without needing to rely on any central party. We hope that this technology can really help enable completely new type of applications to be built that couldn’t be built today and for more information you can go to the website. Thank you. Questions? >> Hi, I’m going back to your original example.
The original example is there were two banks and you wanted to build a machine learning model that combined the datasets. Right? >> Multiple banks. >> Yeah, multiple banks. So, my question is can it come to solve this problem by just using secure hardware? What is the Blockchain connection? It seems like you’re unioning datasets, so just encrypt the data, register keys, and secure hardware and maybe run a R program inside hardware. >> Right. >> So, what is the Blockchain connection? Are you afraid of someone tampering the data? Why do you need Blockchains? >> So, usually when you use Blockchain as that you don’t want to rely on your central party. So, then you don’t need to say like who is operating that, so, essentially, you have a Blockchain platform that people don’t even need to agree on. Like who should be trusted and who should be operating this? So, it’s just right a platform that without relying on any central party and in particular in this case, like here I just gave an example of a few banks, but you can imagine that in other scenarios you can have many parties and you are now going to have all of them to agree, to reach agreements on like who they want to trust and who should be coordinating and running all these things.
So, with the Blockchain, it just helps to again removing the reliance on any such party. >> Another question, but the smart contract code itself that the register number object. That’ll still be public that is not encrypted, right? >> That’s a very good question, so in our platform actually this will be a choice of the developers. So, the developer can specify either they want their smart contract to be public or they can be fully private. Yeah, in which case, it will be encrypted. >> So, there’s also this company Enigma from MIT and I was reading their white paper and it seems like they were also trying to solve something very similar, so maybe if you could say something comparing the two approaches that would be really helpful. >> I think for privacy-preserving smart contracts, in general, people understand its importance.
Actually, I think it’s a very good problem for many people to solve and I think it’ll be great to see the whole community come together to overcome this problem and solve this problem as our academic paper is the first paper in a community that came out that actually develop technology on this, and again I think it’s a very important problem, we want to encourage the whole community to actually work on this. >> I had one follow-up question. So, in one of your earlier slides you had approaches, right? You were using secure hardware, using multiparty computation, you are using zero-knowledge proofs and potentially others. So, my question is like let’s assume that secured hardware is really secured. There I know there are issues with that like what do you need other approaches because won’t it be able to solve like say multi-party competition if you have a secured hardware? >> Okay, yes that’s a very good question.
So, essentially one of the main goals of the Oasis platform with this privacy-preserving smart contracts primitive is that we want to offer to make it easier for application developers to use state of the art security and privacy technologies without needing to be a experts, and also in this case essentially we provide a unified secure computing framework to leverage a combination of these different technologies, and yes, if you’re really choose to fully trust the secured hardware, then essentially you don’t need to use the other and like the crypto based techniques, but however, depending on the actual use case and also depending on developers preference. So, some developers they just want to use crypto based techniques and if their application logic is simple enough and they don’t care too much about the performance overheads, then we are happy to offer them the choice of not relying on secure hardware. >> Any other questions? Okay, so let’s thank the speaker again. Thanks, everybody. .