I like Mike Cohn’s testing pyramid as a guideline for test allocation, and I’ve mentioned it to a couple of teams around here (for more, read Patrick Wilson-Welsh’s blog and/or see his Agile 2008 presentation). Lately, on one team, we’ve been very earnest about writing Selenium tests for UATs, even doing some ATDD. But we’re seeing what many have seen: Selenium tests are often (necessarily) long and slow, and occasionally brittle. I asked the team what our “testing pyramid” would look like, and, notionally, it’s something like this:
^^^^^^^^^^^^ (GUI/system)^^^^^^^^^^ (functional)^^^^^^^^^^^^^^^^^ (unit)
^^^^^^^^^^ (GUI/system)^^^^ (integration)^^^^^^^^^^ (functional)^^^^^^^^^^^^^^^^^ (unit)
I went ahead and grabbed the actual test numbers from the build(s) — simply the total number of assertions in each of the test levels (unit, functional, integration and UI) — and generated a chart in Excel:
I posted it in our war room; we’ll see what kind of conversation it sparks. It looks like we need to continue moving toward increasing the ratio of webrat (integration) tests to Selenium (UI) tests, as well as upping our base level of units. Using the actual data also corrected my anecdotal assumption that we had a lot more unit tests than we do (see my sketch in my previous post).
UPDATE: I ran the numbers for another team. Here’s their chart:
This team has some UI tests written in Watir, but they don’t run them (so they’re useless). All of their integration tests are Webrat; apparently, these can be run as Selenium tests, but the team isn’t doing that (yet). This team has the fundamentals down well — more unit tests than functional, more functional than integration. We’ll see how they expand the upper levels of their triangle in the coming weeks.