Misconfigurations

I am in California for a week, visiting Berkeley and Los Angeles, and also catching up with friends and family in the area.

In talking to my friends in Silicon Valley, I was repeatedly struck by the dissonance between “academic” verification research I do, and the “real world” software engineering grunge that becomes as important, if not more important, than the pure functionality of the code.

For example, there is the issue of system configuration. When you deploy a software stack, say on the cloud, you not only have “your” code, but an entire ecosystem of servers, load balancers, database backends, etc. that must all work together. The magic glue that makes this happen is configuration options written in lengthy files, in somewhat ad hoc languages. Getting the configurations right at deployment, and maintaining their correctness through system and component upgrades is an art. And a nightmare.

It’s a nightmare because your entire system can crash because you have misconfigured your system, even though individual components (your application, the server) are individually perfect. In fact, misconfigurations are a significant reason for system downtimes.

The problem is old and important, but I am not sure it has ever been a hot area in academic research. I recall there was a nice paper by Nick Feamster and Hari Balakrishnan on BGP misconfigurations some years back, and more recently, I read a paper by Mona Attariyan and Jason Flinn on dynamic dependency tracking for debugging misconfigurations. Are there other projects I am missing?

A somewhat naive view is to consider the content of the configuration file as symbolic inputs and to apply symbolic execution techniques, generating a symbolic description of all configurations causing a failure (and by negation, the space of “safe” configurations). I doubt though if symbolic execution can be scaled to the system level. An alternative is to compute the forward dynamic slice of an error trace, projecting to the set of configuration variables. Both techniques are “low level”. It is not clear to me how semantic constraints can be captured in this way.

Finally, why isn’t there a lot more academic interest in the problem? Is it that deployment issues affecting large scale deployments don’t “show up” in most projects in an academic setting?

Advertisements

One Response to “Misconfigurations”

  1. RJ Says:

    rupak,

    there has been some work on (mis)configurations,

    * I believe Jeff Foster and co. have a recent ICSE paper on using symex in the style described above,

    * I saw a nice talk by shriram krishnamurthi (related to his Margrave project) for finding network misconfigurations — there is an earlier paper by the the alloy group, in both cases they use SAT solvers

    * I can toot my own horn about:
    http://wiki.eclipse.org/Equinox_P2_Resolution
    (eclipse now ships with a SAT solver!)

    I think the challenge is that (a) program analyses are too low to actually find what conflicting options are — think of the billions of flags scattered in blast, and their highly “global” interactions, (b) there are specification languages for these — but then the main remaining trick is to convert those languages into an analyzable logic — like Shriram’s and the Alloy work…

    -RJ.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: