Facebook at GTAC on using AI for Testing

As a follow-up to my post on Google’s use of AI in Testing at their GTAC 2014 conference, here is a review of the Facebook Testing session:

GTAC 2014: Never Send a Human to do a Machine’s Job: How Facebook uses bots to manage tests (Roy Williams)

In this talk, Roy Williams tells us about the Facebook code base growing until it became hard for developers to predict the system-wide effects of their changes. Checking in code caused seemingly unrelated tests to fail. As more and more tests failed, developers began ignoring failed tests when checking in and test integrity was compromised. With a release schedule of twice a day to the Facebook website, it was important to have trustworthy tests to validate changes.

To remedy this situation, they setup a test management system which manages the lifecycle of automated tests. It’s composed of several agents which monitor and assign test quality statuses. For instance, when new tests are created, they are not released immediately to run against everyone’s check-ins, but run against a few check-ins to judge the integrity of the test. If the test fails, it goes back to the author to improve.

Facebook test lifecycle

If a passing test starts to fail, an agent, FailBot marks the test as failing, and assigns a task to the owner of the test to fix it. If a test fails and passes sporadically, another agent, GreenWarden, marks it as a test of unknown quality and the owner needs to fix it. If a test keeps failing, it will get moved to the disabled state, and the owner gets 5 days to fix it. If it starts passing again, its status gets promoted, else it gets deleted after a month. This prevents the failing tests from getting out of hand and overwhelming developers, and eventually, test failures being ignored when checking in code.

Facebook test bots and wardens
Slides can be found here by the way.

This system improves the development process by maintaining the integrity of the test suite and ensuring people take can afford to take test failures seriously. It’s a great example of how to shift an intelligent process from humans to machines, but also highlights an advantage of using machines, which is the ability to scale.

Writing this post also made me ponder why I had classified this system as an application of artificial intelligence. I believe the key lies in transferring activities requiring some degree of judgement to machines. We have already allocated test execution to computers with test automation, but in this case, it is test management which has been delegated. I will dig into this topic more in a future post I am working on, about qualifiers for AI applied to testing. 

Overall, this talk was a pretty fascinating insight into Facebook’s development world, with some great concepts that can be applied to any development environment.

Blog at WordPress.com.

Up ↑