One of my first tasks upon starting work at Mozilla was to enhance an existing prototype framework to test the Open Web Apps infrastructure. This infrastructure was built with a tool called Sikuli, which uses image recognition to find particular areas of a user interface and perform a certain set of actions such as clicking a button that looks like a certain image. The infrastructure was originally designed to work with Mac OS X, but needed to evolve to support other operating systems such as Windows 7. I then began making changes to the infrastructure, such as allowing the code base to use imagery specific to Windows 7 and fixing any quirks that caused platform-specific issues on Windows. Upon developing an initial working solution, it was then to be enhanced by the Mozilla community at a test day, or a day where the community and full-time employees directly work together to solve a particular set of current quality assurance problems. However, the test day showed that the infrastructure was not robust, as it ran into issues such as not being able to run under different Windows 7 themes and having inconsistent behavior on different screen resolutions.
In reflecting on this experience, I question now if image recognition is an effective mechanism to rely upon to perform tests across a variety of machines running under different operating systems and other specifications. The rationale behind this is that the testing framework developer has to deal with the overhead of handling all possible customizations of each operating system if he/she expects the framework to run on any possible machine, which is a requirement of the testing infrastructure. For example, what happens if my desktop icons on my machine are large, but on someone else’s machine, the icons are small? The infrastructure then needs to be able to resize imagery based on the machine’s specific settings, which is significant overhead to implement. In the Sikuli tool specifically, our team noticed it mainly matches images off an accuracy percentage, but we did not come across a way to handle the machine-specific issues we needed to deal with. As a result, Sikuli in this situation with our infrastructure does not offer reliability, which is necessary to be able to accurately capture when functionality is and isn’t working consistently across many test runs.
Note that I do think Sikuli in itself makes the development of user interface automation quite simple. For example, typical requirements for our test cases usually just required building screenshots of different portions of the user interface and telling Sikuli to find them, decide if they exist, and click on them. As a result, code requirements were as simple as loading an image and sending it to a specific Sikuli function to perform the action required (e.g. click). As a result, the tool itself benefits from simplicity, making the barrier to entry to learn and build working scripts specific to your machine quite low. This simplicity is important especially to a team and a community building the code base, as it reduces the time overhead requirement to be able to make an effective contribution to improve the code base in itself.
Knowing now that our testing infrastructure requires both low barrier to entry and reliability across various platforms, our team is re-thinking our approach to building our test infrastructure. Some questions as a result that need to be answered are:
- Are there other tools that could better fit our needs?
- Do we need to consider building out specialized tooling to support simplicity and reliability?
- Are there other considerations we know now that we also need to pay attention to in designing the test architecture?
I welcome any thoughts on what people think about designing a test infrastructure for simplicity and reliability. What do you think makes a software development tool have a low barrier to entry? What allows a developer to confirm that a software development tool has reliable behavior in the context of his/her project?