Technical Architecture of KVdb

Last summer, as a weekend project, I decided to build a key-value store as a service. But as weekend projects are wont to do, it quickly became something of an obsession for a few weeks, until I quietly launched it and promptly forgot about it.

A little more than a year later, I found myself with some free time, so I decided to continue feature development, even though I didn’t know who my users were… I wasn’t even asking for emails during sign up! But then again, it had become a bit of a passion project… anyway, the subject of this blog post is more of a technical nature, so I’ll do one on product later 😇

KVdb is a key-value database as a service, similar to DynamoDB and others, but offers a much simpler API and a few other useful features. I built it to simplify stats collection from different client apps I was involved with building, and I needed a way to count different metrics without defining an upfront schema. Clients would need to be able to update stats simultaneously without data corruption.

Besides, why not start a new project from scratch? ¯\_(ツ)_/¯ (I’ll tell you why this can be a terrible idea in a future post!)

KVdb offers a simple HTTP API that lets users create buckets (a collection of keys), set and get keys and values, and operate on values. It’s built in Go and includes a scripting engine for running custom Lua scripts on the server with direct access to a user’s data without added network latency. Yeah, let’s just call it serverless 😏

I’ll describe the different parts of the system, starting from the bottom up.

Storage Layer

The initial prototype of KVdb supported a single backend, BadgerDB, which is a fast key-value database written in pure Go, similar to RocksDB, but a few different performance characteristics. As the backend evolved, I decided, maybe rather foolishly for a side project, to support multiple storage backends. A voice inside my head said DO NOT DO THIS! but alas…

At the lowest level, from an implementation perspective, here’s the high-level interface representing a storage backend:

‌type KVBackend interface {
	NewTransaction(update bool) BackendTransaction
	Sync() error
	Backup(w io.Writer, since uint64) (uint64, error)
	Close() error
	GC() error
}

The NewTransaction function lets consumers of the API create a read-only or read-write transaction, depending on the value of the boolean argument. It returns a BackendTransaction interface, which is implemented by a backend-specific wrapper.

Another function of note is GC, which is used to force the backend collect garbage, if supported, such as to clean up values that have expired. Of course, not all backends support the concept of garbage collection, but in KVdb, this function is called periodically based on system activity.

Since BadgerDB was the initial backend to be implemented, I modeled most of the interface after theirs, including the transaction interface:

type BackendTransaction interface {
	Get(key []byte) (*Item, error)
	Set(key, value []byte) error
	SetEntry(e *Entry) error
	Delete(key []byte) error
	NewIterator(opts IteratorOptions) Iterator
	Discard()
	Commit() error
}

Keys and values are byte sequences, but there is a convenient Entry struct that lets you specify metadata about a key-value pair:

type Entry struct {
	Key       []byte
	Value     []byte
	UserMeta  byte
	ExpiresAt uint64
}

UserMeta lets KVdb interpret the contents of Value differently, depending on its value:

const (
	// Raw bytes
	UserMetaNone byte = 0x0

	// Integer (int64) value
	UserMetaInt = 0x1

	// Floating-point (float64) value
	UserMetaFloat = 0x2

	// JSON value
	UserMetaJSON = 0x3
)

For example, if it’s a JSON value, KVdb’s HTTP API can set the appropriate Content-Type header, etc. When a key is set using the HTTP API (without a Content-Type header), KVdb attempts to coerce the value into one of the supported types (integer, floating-point, JSON, or none) and sets the appropriate value for UserMeta. Automatically detecting the value type makes it easier to perform other operations on the value, which I’ll discuss later.

The Iterator interface lets us navigate key-by-key through the keyspace. To create an iterator, we can specify iteration options to narrow down what we’re looking for:

type IteratorOptions struct {
	PrefetchValues bool
	PrefetchSize   int
	Prefix         []byte
	Reverse        bool
}

type Iterator interface {
	Rewind()
	Seek(b []byte)
	Valid() bool
	ValidForPrefix(b []byte) bool
	Next()
	Item() *Item
	Close()
}

By implementing these interfaces, it’s possible to add many kinds of storage backends. For backends that are more primitive and don’t implement transactions or iterators natively, it’s a bit more work to supprt them in KVdb, but definitely doable with enough glue logic.

However, the interface described aren’t intended for end users, since there is no notion of separation of key-values among users.

Bucket Abstraction

A bucket is a collection of keys and values, per-user isolation, permissions, and key expiration settings, among other things. Internally, the storage layer backends are just flat key-value stores with no concept of buckets or namespaces, so we have to implement our buckets on top of them. I decided to generate a UUID (v4) when creating a bucket, and use it as a prefix for all keys stored. Essentially, when storing a key users:1:name in a bucket, it’s storing it under the key <bucket uuid> “users:1:name”. Thus when accessing keys in a bucket, a prefixed iterator is used with the bucket UUID to seek to the correct position in the key space.

Here’s the Bucket interface, which builds on the concepts already discussed:

type Bucket interface {
	Id() []byte
	Policy() BucketPolicy
	UpdatePolicy(policy BucketPolicy) error
	Scoped(scope Scope) Bucket
	Get(key []byte) (item *Item, err error)
	Set(key, value []byte) error
	SetEntry(e *Entry) error
	Delete(key []byte) error
	List(prefix []byte) (keys [][]byte, err error)
	NewIterator(prefix, startKey, endKey []byte, limit int, reverse bool) Iterator
	AccessToken(prefix []byte, perm access.Permission, validity int) (string, error)
}

Access to a bucket is controlled by the BucketPolicy, which specifies master access keys for various operations. Users can also generate ephemeral access tokens scoped to a specific subset of the key space. In other words, you can hand out a token (instead of the master access keys) that is valid for:

Prefix: users:1:
Validity: 10 minutes
Permission: Read, Enumerate

A unique per-bucket signing key is used to sign an HMAC that is attached to the access token, which itself encodes the permission and validity information. The access token is versioned to allow updating its structure in the future. To keep things simple, I decided to not use something more flexible like JSON Web Tokens, but I might reconsider in the future.

There’s currently no support for permanent, revocable access keys (i.e., non-ephemeral), but it’s on the roadmap.

When a bucket is accessed with an access token, to prevent unauthorized access and permission logic errors, each operation to a bucket specifies a permission level, and during authorization, the a “scoped” bucket is used to limit operations allowed by the access token holder. This is enforced at the bucket level, which nicely keeps authorization logic contained and doesn’t leak it into the storage layer nor the access layer.

Access Layer

KVdb provides a simple HTTP API that’s convenient to use from Web environments, other programming languages, and even the command-line. The end developer API is quite simple, here’s the entirety of it from the server-side:

// create new bucket
e.POST("/", createBucket)

// update pucket policy
e.PATCH("/:bucket", updateBucketPolicy, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// check if bucket exists
e.HEAD("/:bucket", checkBucketExists, ParseBucketKey, GetBucketPolicy)

// get bucket policy
e.GET("/:bucket", getBucketPolicy, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// list keys
e.GET("/:bucket/", browseBucketPath, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionEnumerate))

// check if key exists
e.HEAD("/:bucket/:key", checkBucketKeyExists, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionRead))

// get key
e.GET("/:bucket/:key", browseBucketPath, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionRead))

// set key
e.Match([]string{"POST", "PUT"}, "/:bucket/:key", setBucketKey, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionWrite))

// operate on a key
e.PATCH("/:bucket/:key", operateBucketKey, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionWrite))

// delete key
e.DELETE("/:bucket/:key", deleteBucketKey, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionDelete))

// delete bucket
e.DELETE("/:bucket", deleteBucket, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// generate access token
e.POST("/:bucket/tokens/", generateBucketAccessToken, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// list scripts
e.GET("/:bucket/scripts/", viewBucketScripts, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// edit script
e.GET("/:bucket/scripts/:script/edit", editBucketScript, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// upload script
e.PUT("/:bucket/scripts/:script", putBucketScript, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// delete script
e.DELETE("/:bucket/scripts/:script", deleteBucketScript, ParseBucketKey, GetBucketPolicy, RequireBucketAdmin)

// execute script
e.Match([]string{"GET", "POST"}, "/:bucket/scripts/:script", execBucketScript, ParseBucketKey, GetBucketPolicy, RequireBucketPermission(access.PermissionNone))

KVdb uses the Echo web framework, and in the above code snippet, we define the routes and middleware that runs before business logic, such as fetching bucket policies, and enforcing requested permission levels. As mentioned in the previous section, the actual permission checking logic is contained in the bucket implementation because we don’t want to rewrite it for every single access method. It makes it easier to add support for other access methods like WebSocket or Redis RESP protocol and to delegate access control to a lower layer.

Of significant note is the ability to operate on values. For instance, say you’re storing an integer and want to increment or decrement it by some value. Instead of getting the value, updating it on the client side, and setting it back, developers can let KVdb perform the operation atomically on the server with proper locking to avoid concurrency issues and data corruption. Currently, only number values support these kinds of atomic ops, but I plan to add support for updating nested values in a JSON object, appending to values, “set if not set”, and other operations requiring atomicity.

Scripting Engine

As KVdb is a remote key-value store, there will always be some network latency that makes some operations inefficient compared to if the database was on the same machine as the client. Enter server-side scripts.

I took inspiration from both Redis’ use of Lua scripts and OpenResty, a high-performance web framework built on top of LuaJIT and nginx. In hindsight, going with a JavaScript scripting engine would have been better since the developer audience is larger, but I needed something that was easy to embed into a Go program, and I had previously built a toy Objective-C/Lua bridge, so Lua seemed like the best option at the time.

Deciding to integrate a scripting engine is a careful decision that needs to account for security, sandboxing, performance, and developer ergonomics. While the performance characteristics of LuaJIT are amazing, I was neither willing nor confident to use Cgo in the main process; one mistake could bring down and corrupt the entire database, not to mention hard-to-catch bugs.

I went with a pure-Go solution, which is an order of magnitude slower, although it may not have a real-world impact since scripts generally do data filtering, simple processing, and result formatting, instead of computationally-intensive workloads.

For the initial implementation, performance wasn’t a big goal, since network latency dwarfs any script interpretation or call latency by orders of magnitude. With current use, I haven’t encountered performance issues even under high load, but it’s an area to revisit and measure in the future.

Letting random Internet users run random code on your systems is enough to make any security-minded developer shudder. How can you do it safely?

I had a few requirements from the beginning:

Sandboxing: Only permit access to whitelisted Lua standard library functions
Execution time limit: Terminate scripts that take too long to finish
Bounded memory: Limit the amount of memory scripts can allocate

If you remember, in the storage layer, I mentioned the concept of a scoped bucket, which essentially sandboxes a bucket into a specific prefix of the key space with a designated permission level. That was implemented specifically to secure the scripting environment and prevent malicious code from being able to “escape” its own bucket and access other buckets’ keys.

Sandboxing Lua code is quite simple if you know what you’re doing and have an implementation that lets you define the global Lua scope and bring in only functions that you need. It’s easy to get it wrong, though, so the less functionality you expose, the better. Failing that, sandboxing in pure Lua code is possible by removing functions that are deemed too dangerous.

Scripts in KVdb are run in the context of a bucket, so each bucket can have multiple scripts. The scripting API is simple, but supports all functions provided by the HTTP API. A script is triggered by a GET or POST request to its URL and it can output almost any HTTP response. For example, the following script atomically increments a counter keyed by the current date and returns the latest value stored:

local key = os.date("visitors:%Y-%m-%d")
local err = kvdb.incr(key, 1)
if err then
  kvdb.say("error incrementing: " .. err)
  kvdb.exit(500)
end

local value = kvdb.get(key)
kvdb.say(value)

You can even access query string parameters and POST body variables:

kvdb.say("Hello " .. kvdb.escape_html(kvdb.var.name))

Scripts execute in the authorization context, so if a bucket policy forbids public reading or writing, the script will execute, but it won’t have access to the keys. Likewise, if the script is accessed with an access token (e.g., in the URL), it will inherit the permissions contained therein. However, should a developer need to customize or extend permissions, they can do so by adjusting the scope in the kvdb.scope variable:

-- only allow writing to a subset of the database
kvdb.scope.prefix = "users:1:"
kvdb.scope.permissions = {"write"}

kvdb.set("users:1:name", "bob")
kvdb.set("users:2:name", "kim") -- this will fail

Fortunately, I didn’t make the same mistake in overengineering the scripting system to support multiple backends like I did with the storage layer, so for the moment, KVdb is tightly coupled to Lua. However, the next major scripting release might include support for JavaScript on the backend using the V8 scripting engine, and I’ll likely scrap Lua scripts entirely since nobody seems to be using them.

Wiring up V8 into the system is a monumental undertaking to get right, requiring plenty of Go, C, and C++ glue and careful memory management, so I’m glad I didn’t seriously consider it when starting out. Besides JavaScript, V8 lets developers run WebAssembly code (and thus any programming language compiles to it), so it’ll open up KVdb scripting to larger developer audience 😇

The End

From the outside, KVdb may seem like a simple weekend project, but it’s turned out to be quite a big one and I’ve learned a number of lessons, both technical and product that I’ll write about in future posts. I’ve struggled at times and wanted to give up, but I see active usage and growth in the product, so I know there is at least a small nugget of value. I’m continuing to observe and talk to users and create content. Hopefully I can share more of the process and what I learn on this blog.

If you made it this far, thank you! I hope it was insightful and if you have any questions, feel free to drop me a line via email.