lloyd.io is the personal website of Lloyd Hilaiel, a software engineer who works for Team Ozlo and lives in Denver.

All the stuff you'll find here is available under a CC BY-SA 3.0 license (use it and change it, just don't lie about who wrote it). Icons on this site are commercially available from steedicons.com. Fonts used are available in Google's Web Font directory, and I'm using Ubuntu and Lekton. Finally, Jekyll is used for site rendering.

Finally, Atul, Pascal, and Stephen inspired the site's design. And in case you're interested, this site's code is available on github.

Persona Architectural Changes
2012-06-15 00:00:00 -0700

The Persona Login service lets web developers implement seamless login via email address with a trivial amount of code. It is written in NodeJS and is supported by Mozilla. The service is deployed as several distinct NodeJS processes. Recently we've added a new process to the service, and this short post will describe what's changed and why.

This post is targeted at interested community members, and the people who build, test, deploy, and maintain Persona.

Previous Software Layout

The Persona deployment previously consisted of the following processes with said responsibilities:

In addition to these NodeJS processes, we have quite a bit of infrastructure for high availability in our layout as well. Roughly, here's what Persona looked like:

Persona Arch Before

Adding router

As you can see in the description above, the browserid process had a large set of diverse responsibilities. Repeatedly in load testing we have seen that process CPU bound and is the current bottleneck of the system. Further, we have begun work implementing countermeasures to keep the service available during malicious network attacks. Both for performance purposes and to create a place to run countermeasures, we've introduced a new process that subsumes some responsibilities of browserid, she's called router.

The new process and responsibility breakdown is as follows:

As you can see, the responsibility shift is extremely minor in this first push to introduce router. We have not yet introduced the countermeasures mentioned above, nor have we moved all of orthogonal responsibilities of the browserid process. These changes are planned for future trains (with 2 week development cycles, the only way to roll is incrementally).

The new software deployment looks roughly like this:

Persona Arch After

Deployment Considerations

As you might expect, a very small part of the Persona deployment is actually about NodeJS: of equal importance are the mechanisms that start the programs, display statistics and health, and monitor them to allow us to pro-actively discover problems before they impact users. The next several sections run through the new deployment architecture from these different perspectives.

Monitoring

The router has no interaction with our database servers, it's primary function at this point is to forward requests to the appropriate process. Monitoring should initially target non-200 HTTP responses. While 4xx responses are part of normal operation, 5xx should be monitored and alerted.

In addition to monitoring HTTP responses, the router emits statsd events over UDP. These are all name-spaced under browserid.router., and the most interesting stats emitted are API invocation counts, under browserid.router.wsapi.*.

Health Checks

Load balancers need an externally accessible health check URL to verify the health of a component. The browserid process exports a ping api that can be hit via router. Because these two processes run on the same machine, they should be considered a block. The ping api verifies the ability of router to forward requests, and the ability of browserid to process API requests, AND the ability of browserid to access the database.

Logging

What you'll see in the router's log-file is mostly reports of forwarded requests. That's all it does. Any error level messages are of great interest.

Expected Performance Impact

The router removes HTTP forwarding from the most heavily loaded process in the system. Now that browserid is no longer responsible request forwarding, it's expected that we'll see slightly higher possible ADU in load testing, with greater total processor usage on our machines.

Upcoming Work

The static process

An immediate desire is to introduce another service for static file serving. This service would be responsible for serving all of our static content. This includes both:

A CDN

All static content should be moved off to a Mozilla controlled HTTPS CDN for lower latency and faster load times. the static process discussed above would then be nothing more than a CDN origin server. We will soon deploy a change that will move all static content off to a different hostname to prepare for the introduction of a CDN.

Anti abuse heuristics

Now that router is in the works, we can move forward and begin to implement anti-abuse heuristics for different types of live network attacks (initial focus will be on (D)DoS.