lloyd.io - Securing Web Extensibility

Securing Web Extensibility

2010-01-31 00:00:00 -0800

(originally posted on the Yahoo Developer Network)

In recent years, we've seen increased energy put into web extensibility platforms. These platforms let distributed developers collaborate to produce new kinds of interactive features on websites and in the web browser itself. Because these platforms frequently enable data-sharing between multiple distinct organizations, and often sit between two completely different security domains (desktop vs. web), the security and privacy issues that arise are complex and interesting. This post explores some of that complexity: both the current state of platforms that extend the web and their associated security challenges.

Plugins, Extensions, and Mashups: A Primer

The extension platforms of the web today exist at every level of the technology stack. First there are mashup platforms, which include Facebook Apps, Google Gadgets, and YAP Apps, built on Yahoo!'s Application Platform. All these platforms are abstractions that allow developers to embed content within a host site. Often this content has a level of dynamism and interactivity that exceeds what is expected of "content" — so they're called "Applications" or "Gadgets."

A deeper type of extensibility occurs in web replacement technologies. Examples include JavaFX, Adobe Flash, Microsoft Silverlight, and Google Chrome Frame¹. These technologies leverage exisiting hooks in browsers to replace native rendering and scripting technologies from the browser vendors with new environments that claim to offer a variety of benefits.

In addition to web replacement technologies, there's the related area of web augmentation technologies. These use hooks exposed by browser vendors to bolt on native code, however, unlike replacement technologies, these tools attempt to expose new scriptable facilities to browser-based JavaScript. (Google) Gears, (Yahoo!) BrowserPlus, Mozilla Geode², and Loki are some prominent recent examples of such technologies. At the fringe of this category, there are several projects that abstract different replacement and augmentation technologies to provide feature focused abstractions. Prominent examples include include SVG Web for cross browser SVG support and the YUI Uploader for in-browser content upload with in-page feedback.

The existence of both replacement technologies and augmentation technologies for the web is made possible by the browser vendors themselves, who generally expose two different ways of extending the browser.

Browser plugins were originally intended to allow 3rd parties to add support to the browser so as to handle new types of content. They have since evolved into a rich means of embedding scriptable code into browsers, code that may (optionally) render in the browser's content area. (Microsoft calls these plugins Content Extensions, while other browser vendors have converged on the NPAPI architecture). A second, much more fragmented means of extending modern browsers is to write a browser extension (referred to as a "Browser Helper Object" by our friends in Redmond). Extensions typically have browser lifetime, as opposed to page lifetime. They also have programmatic access to manipulate the user interface of the browser.

In terms of Browser Extension environments, the most recent developments include Mozilla JetPack and Google Chrome Extensions. Both JetPack and Chrome employ web technologies to broaden their target developer audience. In Chrome, an extension is "essentially [a] web page", while with JetPack, "anyone who knows HTML, CSS, and JavaScript" can create an extension. Because of the large numbers of distributed developers authoring software which teeters between the world of unrestricted code running on your desktop and the sandboxed code running in your browser, these projects raise unique security issues. Our own work with BrowserPlus addresses a closely related set of challenges and will be used throughout the article as a point of comparison.

Sandboxing vs. Code Review

In designing an extension system where developers contribute code, there's a key upfront decision that has a very deep effect on the platform — To what extent should plugins be trusted? Should plugin authors be forced to attain the approval of some body of reviewers, or do we instead rely on a software sandbox to mitigate the potential harm that could be done by untrusted code from unknown authors (and hence relax the review requirement)? The former approach can minimize upfront investment in platform development, while the latter can reduce delays in publishing 3rd party code.

In addition to increasing complexity, the sandbox-based approach constrains the creativity of developers who use the platform. The universe of what is possible is pre-defined by the platform authors, which is contrary to the idea of an extension system. Thanks to Atul Varma's recent overview of JetPack development, I now can identify this tension as Jonathan Zittrain's "Generative Dilemma".

At first glance, the platform developer might feel stuck with a binary decision between generativity vs. self service (the ability of plugin authors to publish immediately without oversight or review). A little more thought, however, reveals that in reality there's a spectrum with several interesting intermediate choices. The first hybrid to consider: a sandbox with a set of additional capabilities or permissions that may be requested by code running therein.

Building a Capable Sandbox

In many ways, browser extensions can be considered conceptual peers of the browser: They share direct access to the end user and to the resources of the machine within which they run. Additionally, given the wide target audience of modern extension systems, developers with widely varied levels of experience and grasp of security issues are empowered to author and quickly publish extensions. Providing extensions with system access to permit generativity creates a tension with the requirement that extensions run in a locked-down environment to preserve reviewless publishing and prevent accidental or malicious end-user compromise. We seem to be converging on a capability-based security model, where sandboxed code may explicitly or implicitly³ request enhanced "capabilities" or "permissions." Several of the projects mentioned here have mechanisms which allow an extension to express its required permissions:

Chrome extensions require explicit expression of capabilities (or "permissions") in a manifest.
In JetPack, it may eventually be possible to automatically determine the capabilities required by a given JetPack, based on the dependency graph of "superpowers" it uses (see below)
In BrowserPlus, as in Chrome, permissions are explicitly stated in a manifest.

As different projects attempt to build capabilities-based systems, an interesting opportunity for collaboration arises. Users are going to learn to some extent what "writing to a temporary location on your disk" means. They will also learn more about "using location," and maybe also about "capturing video using a webcam." As new platforms emerge that ask the user more and more questions, it would be beneficial if these platforms shared similar capability lists, perhaps even a similar language or visual vocabulary.

From Capabilities to Meaningful Decisions

While we all seem to agree that an enumerable list of capabilities is a prerequisite to intelligent interface design, it's not clear that we've yet determined how to consolidate this information in a user interface that allows meaningful decisions. This is not for a lack of thought: Over a year ago, Dion Almaer suggested a "nuanced question" combined with good post-decision feedback. Since then, he's summarized ideas from Mozilla folks regarding a "layer [of] social trust on top of the technical security." More recently, there's been some initial discussion of a stop-light style representation of risk, which if correctly applied could distill a complex decision into a series of simple questions, (for example, Do you trust yahoo.com?) which iteratively becomes more ominous as the stakes (and the implied risk) get higher.

Better Responses Through Fewer Questions

One approach employed by BrowserPlus attempts to minimize explicit prompting and leverages mechanisms of implicit consent instead ⁴ by carefully crafting the API between trusted and untrusted code. One such example is file selection, where the act of navigating an operating system-supplied file picker dialog indicates implicitly that the site should be allowed to read the contents of selected file(s). Another example where this technique might be applied involves webcam access. Rather than asking the user if a site may use the webcam, pop up a window displaying the view from the webcam, allowing the user to capture a picture inside that frame (outside the control of the page), and finally after capturing "share" the capture with the page.

Architecting APIs and UI in this way can somewhat alleviate the need for explicit up-front questions and allow in-context kinesthetic decision making to occur. This approach is not without fault however, as it can also confine the freedom of UI designers. Eliminating needless questions assumes that only by asking fewer questions of higher quality, will we truly empower the user to make meaningful decisions.

Permission Prompts in Practice

In addition to ongoing thought experiments in meaningful user prompting, there are plenty of instances in practice where user questions have been built to varying degrees of success.

Facebook combines "social trust" (the rating) with a generic bit of language about how an app can poke at your stuff:

facebook prompts

My Yahoo! takes a language-laden approach to the problem:

yahoo prompts

BrowserPlus uses a lot of screen real estate to convey a fairly literal translation of capabilities into human language:

browserplus prompts

Google Gears adds a visual representation of a capability and a prominent display of who is asking:

google prompts

Language Choice and the Generativity Continuum

The choice of implementation language adds another dimension to the problem of security in web extension systems. As might be expected, native code (C, C++, or Objective C) will always afford maximal access to the system, yielding the ultimate in generativity. Certain features cannot be accessed from a higher level language — some good examples include access to physical position information from a motion sensor⁵ or access to a wifi card to extract nearby wifi base stations with associated signal strength. The benefit to working at such a low level is, in turn, this freedom. Ultimately, attempts to introduce sandboxing into an extension system that supports native code run the risk of introducing extra developer facing complexity without giving much in return⁶. In general, the choice of implementation language can bound where an extension system falls on the generativity continuum:

Language choice

Languages like Lua or JavaScript land on the other side of the extreme and apply well to a system where automatic publishing is important. In the case of JavaScript, there is no common defined standard library⁷, so this makes it possible to introduce new protected file system APIs (for instance) without surprising a developer. This same reasoning is what places Ruby and Python in the middle of the continuum. While it would be possible to embed Ruby and shoehorn in a capability-enforced access system (or perhaps use a modified version of the "Safe" mechanism that already exists in Ruby), it would yield a system sufficiently different from expectations that it might feel clumsy or confusing to developers.

While none of these claims are without exception, it is true that the two newest browser extension platforms today are built upon JavaScript and are sandboxed and capabilities-based. Older plugin and extension systems tend to leverage native code and either use code review or don't even attempt to address the issue of security in any granular way.

Case Studies in Extension Security

So far we've enumerated several pertinent decisions that extension platform authors must make:

Are submissions sandboxed or reviewed?
What language are extensions written in?
Is the system capability based? If so, how is this implemented?

As it turns out, the two newest browser extension platforms avoid straight answers to these questions. Given the tradeoffs with each decision, both systems are hybrids that attempt to capture the benefits of reviewless publishing, yet still preserve the generativity of native code. They leverage JavaScript to provide an authoring environment that's accessible to a majority of the world's developers, but allow the sprinkling of native code in controlled ways to preserve generativity. It's deeply interesting to look at how these platforms straddle some of the trickiest questions.

Google Chrome's Extensions

Aaron Boodman, working on extensions for Google Chrome, recently spoke about where Chrome can be found on the generativity continuum. Chrome uses a hybrid approach where "webby" extensions may optionally include native code bundled as scriptable NPAPI plugins. Extensions which are based solely on the Chrome extension environment's JavaScript APIs need no review and are "enabled automatically." Extensions that include unsandboxed native code are subject to additional review. In Aaron's words:

Aaron's comments are at 21:30

We're really proud of the fact that in the Google Chrome extension gallery we enable extensions automatically, there's no review period. But we make an exception for NPAPI because when you get to native code a lot of the security mechanisms built into the extension system can no longer apply. Once you have native code running it can modify the registry or make permanent modifications to your system... So we have additional review for NPAPI extensions.

This strategy seems reasonable. One would hope that a lion's share of extensions are developed without reliance on native code. The set of extensions that bundle native code can then be periodically examined. Those which leverage native code to do things that may be interesting to a large number of extensions developers and can be considered for uplifting into the core platform.

Mozilla's JetPack

One interesting thing that the Chrome model lacks is a means for sharing privileged functionality between extensions without actually merging it into the core extension platform. JetPack, on the other hand, is taking a much more ambitious initial approach to building a platform that can more rapidly evolve. Atul recently summarized the design goals of JetPack's capability-based security model. They aim to "allow anyone to create capabilities that securely expose privileged functionality." The JetPack world might end up looking something like this:

Flyin' around

The key difference from Chrome extensions is the existence of a middle tier consisting of community contributed bundles. These provide JavaScript APIs to access system resources not otherwise exposed by the core extension system. By breaking out potential platform features into a separate layer you empower the community to not only imply the features that they'd like to see in the plugin platform, but to go and build them. This is an exciting idea. It will be interesting to see how this middle tier of superpowers is embraced by the community of extension developers, as well as to learn the details of how the review process is structured.

Today's Extensions as Tomorrow's Web

Web extensions today are a dynamic and fascinating area at the intersection of several disciplines. This post attempts to give a taste of some of the considerations and tradeoffs involved in the architecture of such systems. Hopefully I've sparked your interest to learn more or perhaps even join the fray. A renewed interest in more open software platforms is no surprise, and the potential to harness the creativity of developers of the world is inspiring.

One area, however, that dampens the joy of web extension platforms covered here is that they're limited in scope — vendor locked and browser specific. While we'll hopefully see increased (and appropriate) cross-vendor collaboration, it's reasonable to conclude that because "browser extensions" are about extending web browsers — and because different web browsers can have vastly different architectures — browser extension APIs will be slow to converge.

Imagine, however, an extension system which struck a balance between simplicity, generativity, and security, and was pertinent to the whole darn web. This is the promise of Web Augmentation technologies. Distributed Extensibility has been a topic of discussion in terms of HTML5 for a while now, but the conversation thus far has failed to cover a distributed way to explore new standard JavaScript APIs (the stuff that so many emergent HTML5 standards are made of). A significant obstacle to this conversation is the problem of helping users understand and securly navigate a dynamically evolving web.

The current work being done to secure extension platforms logically extends to web augmentation platforms. The degree to which we can collectively refine these permissioning models may become the key limiting factor in how far we can migrate our current desktop experience to the web.

Lloyd Hilaiel
BrowserPlus Hacker

1. Chrome Frame is certainly a fringe case where the rendering engine of one browser is replaced with that of another. Discuss.

2. Geolocation was ultimately built into Firefox 3.5, however Geode still serves as a great example of web augmentation — in this case using web extensions as a way to experiment with new potential core features for the web.

3. Whether the request of capabilities is implicit or explicit is an interesting related question. Having a platform that can discover the required capabilities of an extension without the author having to explicitly enumerate them would yield something that's easier to develop for, at the cost of complexity for the platform implementor.

4. The notion of implicit consent isn't new — there's precedent, for instance, in Flash where certain API functions can only be performed from an event handler generated by a user action (mouse or keyboard). The simple idea is that user actions can be a meaningful part of the security system.

5. What is possible without native code is obviously a moving target. Browser-based access to the motion sensor was introduced using BrowserPlus a couple years ago, and subsequently has made it into Firefox 3.6. What's next?

6. The potential efficiency of native code is an exception, and Google Native Client is an example of a sandbox system around native code. The project focuses on enabling the efficiency of native code with the saftey of the web, and additionally wraps and exposes some system resources, such as audio. Another glaring exception might be Apple's sandboxed environment for authoring iPhone Apps, a case where Objective C affords a slightly higher level of abstraction for developers yet still allows native code efficiency — important on a mobile device.

7. The CommonJS movement is an attempt to bring a standard library and means of including (or "requiring") code to JavaScript. While this could standardize APIs for JavaScript, developers will still generally be forced to understand the difference between core JavaScript and libraries provided by the execution environment.