I’ve always wanted a concise and beautiful schema language for JSON. This desire stems from a real world need that I’ve hit repeatedly. Given in-memory data that has been hydrated from a stream of JSON, of questionable quality, validation is required. Currently I’m constantly performing JSON validation in an ad-hoc manner, that is laboriously writing boiler plate code validating that an input JSON document is of the form that I expect.
Manual validation is problematic for a variety of reasons, and there are several features afforded by automatic validation. My favorite being high quality helpful error messages upon bogus inputs. Aaron Boodman has talked a bit about the why over on his blog
So what do I want?
- A terse yet flexible means of describing the structure of a JSON document
- Something that’s easy on the eyes
- Something that rolls off the tounge
JSONSchema’s diet
“But wait!” — you exclaim! There’s JSONSchema! And I agree, JSONSchema is mostly a good thing, and gets us most of the way there. JSONSchema is a flexible means of describing the structure of a JSON document. But I wouldn’t call it terse. Taken from json-schema.org, compare the complexity of the JSON document:
{ "name" : "John Doe", "born" : "", "gender" : "male", "address" : {"street":"123 S Main St", "city":"Springfield", "state":"CA"} }
With the schema that describes it:
{ "description": "A person", "type": "object", "properties": { "name": { "type": "string" }, "born": { "type": [ "integer", "string" ], "minimum": 1900, "maximum": 2010, "format": "date-time", "optional": true }, "gender": { "type": "string", "options": [ { "value": "male", "label": "Guy" }, { "value": "female", "label": "Gal" } ] }, "address": { "type": "object", "properties": { "street": { "type": "string" }, "city": { "type": "string" }, "state": { "type": "string" } } } } }
NOTE: I did run this through json_reformat, the pretty printer that ships with yajl – so to be fair, we could combine some lines here.
Now don’t get me wrong. I believe that the feature of JSONSchema that it can be represented in JSON is very important. This means that there’s less bloat in the core toolchain when you choose JSON for some portion of your data representation needs, and that holds up to Douglas’s promise of a “low fat alternative”. Rad. But I don’t like how hard that schema is to read and write for a human like me.
So let’s throw a stone as long as we’re driving by: JSONSchema is too big
I think too much has been asked of JSONSchema, from the proposal:
JSON Schema is intended to provide validation, documentation, and interaction control of JSON data.
Interaction control? A cute idea, but I think this is far less important than a functional small language for validation. Perhaps there’s actually one base specification here with some extensions to do interaction control and storage attributes? (read about the transient attribute). Finally, with documentation, I’m again uncertain. Here’s the full list of attributes that make me nervous
- options (label/value)
- title
- description
- transient
- hidden
- disallow
- extends
- identity
Introducing Orderly (v-1)
Orderly, say hi!
string hi {"wassup"};
Orderly…
- … is an ergonomic micro-language that can round-trip to JSONSchema.
- … presently represents a subset of JSONSchema – I’ve thrown out the bits not specifically related to validation.
- … is optional. syntactic sugar. fluff. Tools should speak JSONSchema, but for areas where humans have to read or write the schema there should be an option to expose orderly in addition to JSON.
- … is probably not novel. “nothing under the sun is new”.
- … is a “little baby zygote of an idea”
So lets' meet orderly. This JSONSchema:
{"type":"object", "properties": {"name": {"type":"string"}, "age" : {"type":"integer", "maximum":125}} }
becomes this orderly:
object { string name; integer age[,125]; };
nice, eh? Let’s zip through some examples here:
A string property named name:
string name;
A string property between 1 and 64 chars in length (I assume unicode points here):
string name[1,64];
A number named foo between 100 and 1000
number foo[100,1000];
An optional boolean named hasLotsOfMoney:
[boolean hasLotsOfMoney];
An optional number with a value between 1 and 200 with a default value of 18:
[number age[1,200] = 18];
And for our final example, let’s transmogrify that huge schema up top:
object { string name; union { integer[1900,2010]; string; // OMG, I killed format! } born; string gender { "male", "female" }; // OMG, I killed interaction control! object { string street; string city; string state; } address; } person;
So we’re nowhere near a BNF here, this is simply the part where we walk into the store and start trying things on. Oh, and don’t worry. This isn’t real.