Mobile QA can be a complex topic, as each product team is testing for something different. That said, learning standards and processes from other app and product creators can help us improve our own processes. That’s why I chatted with Chris Hayes from Square’s Android team (and Pierre-Yves Ricau who chimed in a bit) to learn about their approach to mobile QA.
Square has an interesting mobile QA story. Not only do they create both hardware and software, they also have millions of sellers whose businesses rely on their products working properly on a daily basis. Talk about pressure to produce a high-functioning product. Below, they explain how their QA process works and unique solutions to the challenges they’ve faced over time.
How does your QA process work at Square?
We look at app quality and QA with a lot of layers. The first layer is our CI (continuous integration) infrastructure. We make sure every module we have is built and tested in CI.
Most of the apps we create go through extensive user interface (UI) tests, and all go through at least basic unit tests. For our flagship app – Square Point of Sale – we have a little over 3,000 UI tests we run on every pull request against both a mobile device and a tablet. So really, it’s more like 6,000 tests.
By module, what do you mean?
Our code is broken up into different segments, kind of like building blocks for all of our apps. They’re all in one repository and built from common pieces, or with as much commonality as possible. We want to make sure those building blocks are all tested so everything is working reliably for our customers.
That’s why we build every single variant of every app on every pull request. That helps us ensure there wasn’t a small change made in a different module that accidentally broke a feature.
Do you test every change that’s made?
Yes. Every pull request that’s made goes through our CI system. We have upwards of 100 different pieces to our CI pipeline that gets run mostly in parallel. Those tests include running all of our UI tests at one time.
How is your CI system different from most apps or companies?
Other companies sometimes run UI test on pull requests, but they don’t look at the results. They typically move forward with development whether or not the test is successful because there’s so much flakiness in their tests. They don’t trust the tests. Whereas, we’re pretty strict about it. Our team can only merge their changes or pull requests if everything is green.
On top of that, our master branch reruns the combined changesets. After someone merges their pull requests, there could be differences between what we assess on their pull request and the merged results on the master. So we rerun that same set of tests on the master as well.
To illustrate, Dev A makes a change and Dev B makes a change. That’s two changes that are tested independently but might be merged roughly at the same time. So we run the test suite on master to make sure the combined changes don’t break anything.
Our team can only merge their changes or pull requests if everything is green.
How long does this whole process typically take?
Our turnaround time is about 1.5 hours, but we’re trying to shorten that. That’s my [Chris’s] team’s responsibility right now. Our goal is to get it under an hour. When someone opens a pull request, they’ll have a signal – whether it’s green or red – within an hour.
Our testing cycle is probably much longer than most in the industry. App developers tend to aim for minutes, not hours. But they do that by not having UI tests. They test each unit (like a sign-in flow) for bugs, but not their actual UI.
We attribute that to the scale at which we’re doing things. We test extensive permutations of payment flows and settings – basically everything a merchant would do – in our UI tests.
Can you elaborate on the scale of your operations?
Our big difference is where we are in the product life cycle. Our product has a lot of customers and is generating a lot of revenue. Quality is the most important thing that we need. We need to make sure we never break anything. That’s more important than being able to change things faster. We want to keep it reasonably fast, but the most important thing is it should never break for our customers.
The app breaking for our customers literally means their business shuts down. A huge point of pride for our team is that we have millions of sellers relying on our work to run their business on a daily basis. We have a strict responsibility to them that we don’t break the app. And if we do break it, yeah, we lose some money, but worse, in my [Chris’s] opinion, is that our merchants are unable to run their business and make their living. It’s extremely important to us to make sure we’re releasing a solid product every time.
The most important thing is it should never break for our customers.
What does your QA testing process look like?
We run two-week release trains. Meaning, every two weeks, we branch off of master and that becomes our stable branch that goes to release. We then do a full cycle of manual testing. We have 7,000-10,000 manual test cases and employ a team of about 14 testers to go through that whole suite. The goal is for them to go through the highest priority test cases in that suite, then we release to our beta team.
We have tiered levels of test cases. Our lowest tier are the ones that run on master every two weeks. Then we have a higher tier – the more important cases that are rerun as soon as we cut the branch. Then we have our most critical flows (like our payment flow, for example) known as smoke tests, that run after every release build. As soon as that release build passes our smoke tests, we then push it our to our Beta channel.
Our merchants are able to have both the beta version and the released app installed the same time. That way, if something breaks with the Beta, they can quickly switch over to the released version and continue running their business as usual. This is something we do a little bit different from most and had to create a workaround in the Google Play Store.
How do you keep your UI tests stable?
They’re all ran against a very solid mock infrastructure. Our tests aren’t making real network requests to a staging environment. Instead, all the responses are mocked up locally. That gives us control in several different ways to keep everything stable. For example, if our server team is deploying something, we don’t want all of our testing infrastructure to go down.