In other words, a version control system for variations on genetic design.
A 4-gene cluster can be encoded (even with just a library of 16 parts) over 684,000 variants. Clearly, a GenBank files are not appropriate here. Their solution is Knox, where the genetic design space is only about 200k, rather than gigabytes. This “genetic design space” is a format where each edge is labelled with a *set* of parts, from which you can create paths. Design spaces can be concatenated via graph operations using Knox, merged in a variety of different ways.
If you build up a series of these operations, you can then create a Very Large Things. A single design would encode all of the various paths. These design spaces can be stored, and versioned, like is done with git. Combining design spaces in Knox also merges version histories. You can also branch a design space, giving you two different versions to work with. Reversion is also supported.
There is a RESTful API to allow connection between the web application and the graph database. Finch and Eugene are two products which use Knox. In Finch, you can encode variable length designs as it uses regular expressions. This makes it more machine-comparable and mergeable. This can make it harder for humans though, which is where Eugene is beneficial, as it is a more human readable and writeable language, though it is less expressive than Finch and has a fixed design length.
Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!