© Steve Dekorte and Rich Collins 2009
Creative Commons Attribution License 3.0
The Vertex Database
Introduction
VertexDB is a high performance graph database server that supports automatic garbage collection. It uses the HTTP protocol for requests and JSON for it's response data format and the API is inspired by the FUSE filesystem API plus a few extra methods for queries and queues.
Performance
Requests per second are typically network i/o bound and should be in the thousands on OSX, and possibly tens of thousands on Linux (on a typical PC of circa 2008).
Integrity
On startup, VertexDB makes a backup of the database (if it exists). When shut down properly, it creates a file that marks the database as being safely stored. When started, if this file is not found, the database file is replaced by it's old copy. There is also an admin backup request that can be sent to the database to store a backup at any time.
Security
The current way to secure the server is by running it on an isolated network. HTTP auth support will be added in the future.
Applications
Graph databases aren't ideal for all uses, but can be more suitable than other models for dealing with changing or informal schemas and data that is more naturally represented as a graph. It's atomic queue APIs can also make vertexdb useful as a high performance persistent queueing server.
Implementation
The current implementation is in C and built on top of:
- TokyoCabinet, a b-tree key/value disk store
- libevent, for asynchronous sockets and HTTP
- Yajl, for JSON generation
VertexDB uses a single OS thread and handles all requests serially, but uses async sockets so no request is blocked on socket i/o (though it can still block on disk i/o).
Requests should be generally designed such that responses don't spend too much time blocking on disk i/o or using too much memory in queued response buffers. This can be done by "paging" requests by limiting their results count and repeatedly using a query that selects the next set of matches.
Future
Possible future features:
- FUSE interface
- distributed redundant servers
- incremental garbage collection
- distributed synced in-memory transactions
- automatic index creation
- a NodeJS implementation which uses JS for queries?
Basics
Keys, Values and Terminology
VertexDB is composed of nodes which are folders of key/value pairs. Keys are stored in lexical ordering and can be any string not containing a forward slash character (as these are used as path separators).
Keys beginning with an underscore tell the database that the value is a string, otherwise, the value is a pointer to another node.
In this way, the database's garbage collector knows how to traverse the nodes. There is a root node that corresponds to the / path that serves as the root reference of garbage collection scan. Each node also tracks it's size (number of keys it contains) so this can be returned quickly.
Requests
The HTTP protocol is used for requests. The path part of the query specifies a path in the database's graph of nodes and the query parameters are used to specify actions on that path (read, write, search, etc).
Responses
Responses are sent in JSON format in the HTTP content. On error, a HTTP response status of 500 is sent with the HTTP content containing the error message in JSON format.
URLs
For brevity, the "http://host:port" parts of the request URLs fields below are left out and the word "path" is used to mean any forward slash delimited path.
Node Operations
mkdir
Creates a node at the specified path. If the path already exists, no operation is performed. Returns null on success.
sample request: /path?action=mkdir
sample response: null
rm
Removes a node at the specified path. Returns an error if the path does not exist. Returns null on success.
sample request: /path?action=rm&key=foo
sample response: null
size
Returns the number of slots at the node in the specified path. Returns an error if the path doesn't exist.
sample request: /path?action=size
sample response: 423
link
Links a node at a source path to a key on a node at the destination path. Returns an error if the path doesn't exist.
sample request: /sourcePath/?action=link&key=k&toPath=destinationPath
sample response: 423
Value Operations
read
Read a value at a given path. Returns the value on success.
sample request: /path?action=read
sample response: "foo"
write
Sets or appends a value at a specified path. Returns null on success.
set
sample request: /path?key=k&action=write&mode=set
with post: foo
sample response: null
append
sample request: /path?action=write&mode=append
with post: bar
sample response: null
Select Operations
before=key
after=key
count=max
whereKey=k&whereValue=v
op=pairs
sample request: /path?action=select&op=pairs
sample response: [["a","1"], ["b","2"], ["c","3"]]
op=keys
Returns a list of keys whose nodes match the query.
sample request: /path?action=select&op=keys
sample response: ["keyA", "keyB", "keyC"]
op=values
Returns a list of values matching the query.
sample request: /path?action=select&op=values
sample response: ["value1", "value2", "value3"]
op=object
Returns a list of the matching objects in JSON dict.
sample request: /path?action=select&op=object
sample response: {"a":"1"; "b":"2"; "c":"3" } ???
op=counts
sample request: /path?action=select&op=counts
sample response: ???
op=rm
Removes keys matching selection. Returns the number of removed keys on success.
sample request: /path?action=select&op=rm
sample response: ???
Queue Operations
queuePopTo
Atomically moves the first sub-node found at the source path into the destination path (using the same key it was under in the source path).
Returns the key that the node was under.
sample request: /sourcePath/?action=queuePopTo&toPath=/destinationPath
sample response: akey123
optional parameters:
whereKey=k&whereValue=v
ttl=numberOfSeconds
The ttl attribute is a time-to-live value as a number of seconds.
When the server pops the node to the new location, it computes the expiration date and creates a _qexpire and _qtime slots on the node which are used by the server when queueExpireTo requests are sent. Expirations on a node are only performed when the queueExpireTo request is sent to it.
queueExpireTo
Expires items in the queue by looking at their _expire attribute and moves expired items to the specified path.
The _qexpire and _qtime keys are removed after the node is moved.
Returns the number of items expired.
sample request: /sourcePath/?action=queueExpireTo&toPath=/destinationPath
sample response: 42
Transactions
transaction
Allows a set of actions to be done in a single request. Returns null on success.
sample request: /frompath/?action=transaction
with post: [list of request urls]
sample response: null
|