Waldo sessions now support scripting! – Learn more

Managing Complexity in Mobile Testing

App Development Complexity

App Development Today Is More Complicated Than Ever

If you talk to the leader of any mobile development team, they’ll agree that app development has become more demanding over time. Every user expects every industry to have a version of their app accessible from any mobile device.

For highly regulated industries, like banking, this shift requires extreme selectivity when it comes to deciding which services can be included in a mobile app. As Eoin O’Connor, Vice President of Mobile Engineering at a Fortune 500 company in the financial services points out, “We have to be careful about what goes into mobile and what doesn't. Finances are obviously complex, very personal, very emotional. We want to make sure that our customers feel very supported along the way.” But this isn’t the only way that development complexity has risen over time.

Device and API Fragmentation 

While expanding support across many different operating systems is great for consumers, it presents enormous challenges for developers, as Michael Buxton, iOS Lead and Engineering Manager at Surfline can attest. “We had a sticky header in a feature, and there was an issue with some design. In iOS 15 and 14, fixing this problem was a one-liner. In iOS 13, it was going to be three days’ worth of work.” This is a story that’s familiar to experienced developers all over.

Device and API fragmentation can lead to big increases in complexity for development teams. When so many OS versions are supported, developers must constantly produce and maintain a broad range of APIs. Things that are simple for one version of an API can be much more complicated (or even impossible) for a different version. But you can’t immediately abandon older versions of the software. You don’t want to leave customers out in the cold, but newer versions of underlying software mean delivering better software more quickly.

These challenges mean development teams have to constantly make choices. For Buxton’s team, that meant abandoning the sticky header in iOS 13. Although this may seem small at first glance, it can have big knock-on effects for how a team works on their app. This has long-term repercussions for teams developing mobile software.

Language Versions and Migrating Code Bases

In addition to increased complexity from different devices and APIs, the underlying programming languages make code bases more difficult to keep track of. This is true for both iOS and Android. iOS migrated from Objective C to Swift, and the preferred language on Google’s platform migrated from Java to Kotlin.

Migrations between languages and versions challenges even dedicated development teams. Buxton points out that their code bases “were originally written in Objective C,” and they needed to undertake multiple projects to support Swift after Apple changed languages. Even within a language, different versions of the language create compatibility issues. “When Swift version 2.22 came out, we had to do a major refactor of our code base. And we have four apps, all of which share some code. We have a limited amount of time to tackle technical debt. It’s difficult to find the right balance between that and shipping new features.”

The same story is true for companies switching from Java to Kotlin. O’Connor notes that the switch to Kotlin caused a big blip in some code quality metrics. “As we look at tools like Jetpack Compose, that’s written in Kotlin. But if your application is 50% Compose and 50% standard Kotlin, it’s hard to get the right test coverage metrics. It’s tough to know how much of our code is well-tested.”

That won’t always be true. Programming languages often develop ahead of the tools that support them. Many developers who’ve switched from Java to Kotlin find that newer versions of the language improves their development experience, even when they have to abandon high-quality tools. 

For instance, a team that adopts Kotlin after working in Java might find that they need to adapt their existing suite of JBehave or JUnit tests or port them wholesale to support new paradigms and frameworks. Even in environments designed to be interoperable, switching to new languages or language versions can come with major overhead costs. This is a constant tension when developers write cutting edge code. While you may struggle with complexity in some areas, you gain increased development speed, better features, and an improved experience for both your developers and your customers.

App Testing Complexity

Testing Apps Effectively Is a Necessary Challenge

Every experienced technical leader aims to test applications effectively, balancing speed with thoroughness. Few things feel worse than rolling back a shiny new feature a few hours after you shipped it because something critical broke. Or worse, issuing a public executive apology for what you shipped. Every software team strives to ship quality builds to their customers, and the testing processes they put in place to ensure this can determine the team’s success or failure.

Eoin O’Connor talked about testing for their application’s monthly releases: “But we really try to shift left on that mentality. We really want to try and get as much of the testing done early in a cycle, because it helps us release better apps.”

High-Wire Testing Tension

One of the biggest sources of tension between product and tech teams comes from the amount of time testing adds to the development process. Product teams want to get new code into the hands of users as fast as possible. Dev teams are wary of increased delivery speed, because they know well what happens if things don’t work correctly. Balancing these two forces is like walking on a high wire: tip too far to either side, and you have big problems!

The reality is that shipping a buggy build impacts the entire business. While testing may take more time than some would like, it's a small price to pay in the long run. As O’Connor elaborated, “I think a lot of what we do requires explanation and translation. As we expand our testing skills, the observation I'm making is that it's taking us a lot longer to get product features delivered. But this work means that each team doesn’t have to reinvent the wheel for themselves.”

The key, from O’Connor’s perspective, is getting out ahead of the communication around why they’re doing this work early. Effective processes like end-to-end testing have been critical on the web for years now, and are just as critical for mobile apps. But you need to have all parts of your business aligned so that you understand just what investing more time in effective testing buys you. For decades, developers have treated testing as something you just slip in with some expanded estimates of how long any particular task will take. Testing has long existed as something that you kind of keep secret from the rest of the business, because non-technical individuals didn’t see the benefit, and thus didn’t think it was necessary.

That doesn’t have to be the case, and it shouldn’t be, going forward. Developers understand why testing is important. Their challenge isn’t finding time to prioritize testing, but rather communicating with the rest of the business why testing is a worthwhile investment.

Whose Test Is It Anyway?

Testing libraries that are shared across applications has become very complex. Many mobile development companies don’t work on just one application, and understandably, they seek to minimize their repeated code between applications. Michael Buxton, iOS Lead and Engineering Manager at Surfline, notes that their authentication library supports four different applications at the same time. 

This kind of dependency tree can create nightmares for testing. Public libraries spend lots of time ensuring that their updates don’t break anything for downstream customers, but for internal libraries, it’s difficult to invest the same effort. When we talked to Buxton, he explained that their authentication library might work great for one application that relies on a new feature. What they might not know is that some new feature in development breaks when the newest version of the shared library is used. Down the line, a change introduced by a developer on one team breaks the behavior for another developer on a different team.

Buxton handles this by devoting Release Engineers, who work between software teams to “Make 100% sure about something when we push it live. The Release Engineer says they’re certain things will work with each new release.”

This isn’t only one way to handle testing libraries, but it works for Surfline. No matter how you approach it, sharing dependencies between teams increases the complexity for your software team. Tackling this challenge is a key part of scaling a software team as your company grows.

App Team Complexity

Your App Teams Have To Grow

No matter how good your developers are, you eventually have to hire more. There are limits to how much even the best developers can do. But that doesn’t mean that your team of five needs to grow into a team of 500.

It does mean, however, that eventually your development teams will themselves grow more nuanced. The systems and processes that work for a team of three don’t work as well for a team of six. As you grow to a team of nine, or fifteen, or thirty, the complexity of your team also continues to grow.

Before you can understand how to solve for that added complexity, you need to know how it manifests.

Branching Out On Your Own

We have the best of intentions when we start new teams. “Our processes will remain simple and streamlined”. “Our tests will be comprehensive”. “Build times will blaze”. 

Then things change. You introduce one or two little shortcuts. You glue together a library using an ad hoc script instead of an API. Whatever the story: things get a little messy. Then, when a second team comes along, your build pipeline or testing library doesn’t work quite right for them. Instead of adapting existing work, they take the plunge and set up their own system.

Now you have two systems, and the story repeats. The third team strikes out on their own, then the fourth and the fifth. Before you know it, you’re juggling a variety of systems and pipelines to support your teams. You always make those decisions for good reasons, but because of them, your teams struggle more than it feels like they should to work together.

For Eoin O’Connor, this is a constant battle. “For companies like ours, we’ve always worked on the web, and controlled our own destiny. That gave us a lot of flexibility. On the mobile side, things work differently. We’re a bit at the mercy of companies like Apple and Google. Many people who have worked in mobile are familiar with those constraints. I spend a lot of my time working with teams and developers to help them understand how things work differently on the mobile side of the house. Every team wants to focus on shipping new features, they want to get things out the door. But we need to balance that with what’s best for every team as a whole, including our mobile teams.”

Managing Technical Debt Between Teams and Applications

Complexity grows from other causes, aside from teams doing their own thing. Sometimes, it grows out of teams sharing code, and how shared code affects technical decisions. For Michael Buxton of Surfline, running four different applications makes balancing the load of technical debt difficult.

“When you have four different apps, you need to balance technical debt and new features for each app. So we’re constantly asking, what can we do to make this better for everyone?” It’s a constant balancing act for each team and the libraries shared between them. It’s even more complicated when incorporating code improvements that come from mobile OS providers, too. Buxton notes that for Surfline’s applications, sometimes you target different libraries for each app. “Swift has improved their JSON decoding, compared to older iOS development targets. The same is true for Android; their JSON decoding is better than it used to be. So for us, we have to balance this: two of our apps are on new libraries, while two are on legacy code. That’s how we balance it out.”

Team Complexity Is An Unavoidable Challenge

Thinking about these complexities can seem daunting. But remember: it’s a good problem to have. You’re growing your team because people find your software useful. As you add more people, you’re going to add complexity. You can manage and minimize this complexity, but you can’t avoid it completely. 

Keeping the Software Train Moving

One way to think about developing software is to liken it to a train. When you design a feature, you plan the route the train will run. Then, each step of the software development life cycle (SDLC) is like a stop on the train’s route. 

In this analogy, managing technical complexity means getting the train smoothly into, and out of, each stop on the route. Every step of the SDLC is managed carefully to ensure efficiency. At any point that complexity creeps in, you have a complex process, even if every other step is simple. Thus, your goal when managing a software team should be to minimize complexity at every stop on the track.

Minimizing Complexity Through Predictable Releases

If every step of the SDLC is a train stop, then releasing a feature is Grand Central Terminal. It’s the biggest, and arguably most important part of your software development process. Releasing a feature is also when most complexity creeps in. “In any given release, we could have more than 30 or 40 different teams contributing.” That’s the kind of release that can go off the rails in a big hurry.

To manage that kind of flow, Eoin O’Connor’s approach is to create the most predictable, straightforward release process possible. This creates a win for the development teams and for the customers. “We want to make sure that we’re not overburdening our customers with constant upgrades. So, we settled on a once-per-month release cadence.”

By finding a reliable release cadence and shipping monthly, a company in the financial services keeps the schedule predictable for their teams and for their customers. Will all teams have their work ready for the release? If yes, then it goes in. If not, it waits for the next release. This minimizes surprises for the testing and development teams.

O’Connor recognizes that this cadence creates challenges for product teams. “Sometimes, it’s a challenge to balance the way that we think about new features and releases. Features have to work, and they have to be secure, but we want to make sure that they hit their release dates, too. Sometimes, finding that balance requires some hard conversations.” Predictability lets O’Connor shift those conversations earlier in the process, so that they don’t happen right before a release, when stress is high.

Unpack That Suitcase On The Train

We’re stretching our train metaphor here, but there’s another powerful tool to minimize complexity during the development process: feature flags. Feature flags let you turn behaviors within your application on or off without deploying any new code.

Feature flags are such a powerful tool for minimizing complexity because releasing new features often comes with all sorts of side effects. You might need to prepare marketing materials, or transfer data between an old and a new database table. That kind of work takes time. 

If you need to launch a marketing campaign or transfer data right around the same time that you ship the feature code itself, you drastically increase complexity for the team, and ratchet up stress on the team. By using a feature flag to simply ship the new feature turned off, you streamline the entire process, and give yourself plenty of time to make sure you do things right.

Michael Buxton considers feature flags an invaluable part of Surfline’s software development process. “We’ve moved to heavy feature flag development the last couple years. We’ll ship a feature with the flag turned off, then enable it later on. This has done wonders for our releases, and also for testing new features. It allows us to test a feature comprehensively before we turn it on for all our users.”

Feature flags don’t have to be an all-or-nothing proposition, either. Surfline turns on new features for premium users first, to solicit feedback and find any issues which aren’t working quite right, before releasing them to the general public. “We have a dedicated group of professional surfers, but also friends and family, and we send them a new feature, and ask for their feedback. They’re excited by something new, and we hear great things from our most dedicated users.”

Managing Complexity Happens at Every Stop

Each step of the SDLC can add to application complexity. It’s important to drill into the details, but also balance a healthy understanding of the big picture when you think about minimizing technical complexity for your teams.

Complicated Apps Means Complicated Tests

It’s no secret that when your applications grow more complicated, so do your testing flows. Each new feature you empower users with is something that you have to test, every time you prepare a release. After all, you don’t want to release a new version, only to find that it broke a key legacy feature for your users. 

But managing testing complexity can quickly grow into a difficult problem. For some teams, figuring out how much to test, and how thoroughly, is their most critical engineering problem. We’ve spoken with some industry leaders about how they handle the complexity of testing in their mobile applications, and we’re excited to share their thoughts here.

There’s No Replacement For Manual Testing

This is not an exciting answer. Nobody gets hyped up to go tap their way through a bunch of application flows, or enter bad data into a signup form by hand. But manual testing of new and existing features is an irreplaceable step in the testing process. 

For Buxton, dedicated manual testers comprise a key part of the flow. “We have many automated tests, but we also have dedicated manual testers who will go through and identify regressions. We have them work to automate those manual tests, too, and keeping those automated tests up to date is one of our biggest challenges.”

The same learning is true for O’Connor. “We’ve learned that there’s always some manner of manual testing needed. There are test flows that we just can’t automate. That might be because of unique external factors, or device constraints that we can’t replicate in an automated test. We try to find all of the places where we can automate successfully, and then do a great job manually testing the rest.”

Don’t Test Everything All The Time

One of the difficult traps of automated testing is that as your test suite grows, so does the time it takes to run that suite. What starts as a quick process of testing a chunk of code quickly balloons into thousands of tests that can take most or all of an hour to complete. This kind of testing flow becomes a real challenge for developers, especially any time they need to share some work with someone outside of their own personal development environment. Needing an hour’s worth of lead time to spin up a new development build to share an example of in-progress work with a stakeholder can be a frustrating blocker on someone’s day.

For Buxton’s teams at Surfline, they manage that challenge by limiting which tests they run depending on what they’re building. “When we send a build to QA, we only run unit tests. We don’t run our more complicated tests, because that helps us minimize build times. Then, when we create a pull request, we’ll do snapshot tests against multiple devices and OS versions.”

By limiting the tests that run on each build, Surfline creates a quicker flow and gets better feedback for their developers more quickly.

Tests Are Key, But You Have To Manage Them

Writing quality tests is designed to speed up your development team. Teams with poor or no tests often find themselves in a cycle of shipping a new feature, then spending time fixing all the regressions the new feature introduced. Ironically, if you’re not careful, a test suite that’s poorly managed can have the same effect. Too many tests can slow down developers and an over-reliance on automated testing can lead to missing key issues in features that only a human might find.

Handling Team Complexity Means Constant Communication

As Frederick Brooks points out in The Mythical Man-Month, when teams grow, the number of communication points within those teams grows quadratically. Every new member that you add to an existing team doesn’t merely add one point of communication, they can add dozens or hundreds.

This means that in order to effectively manage work between larger and more complicated teams, effective communication is a must. You can’t get around the need for good communication if you want a successful development team. 

Support All Three Legs of The Stool

O’Connor thinks about his teams like three legs of a stool, with each leg supporting a different group in the development process. “For us, there are three important groups in development. Product, engineering, and design. If we’re trying to address a problem with one of those three legs after code is already committed to a repository, it’s probably too late. So we try to identify any problems that might crop up with a feature very early on. We’ve adopted a regular cadence of meetings between each of the three groups to keep constant communication about any problems that might arise.”

“We don’t think of it just as wanting to bring a new feature. When we think about a new feature, we focus it on our customers. How will this work help them? We keep that focus on the customers. Then, once we’ve aligned on how we want this feature to work, then the design team works on outlining how it will work, and we feed that back into the regular meeting, making sure it works for the customers and will fit technically. Then, only once we’ve got great alignment with product and design, do we start to write code.”

By focusing on keeping lines of communication open throughout the entire development process, O’Connor’s teams can manage releases even when forty or more teams contribute to any individual monthly release.

Feature Flags Minimize Cross-Team Headaches

For Buxton, one key way that Surfline has managed cross-team complexity is by using feature flags to help smooth the wrinkles in their release process. “We release once per sprint, and feature flags have been a huge help for us. We use a ‘release train’ model, where the releases go out whether or not everything inside them is 100% ready. Using feature flags means that if we find a regression at the last minute, we don’t have to delay everyone’s release. We can just disable the feature. If we find an issue after the fact, we don’t have to roll back, we can just disable the feature. Then, when a feature is ready, our biggest concerns are things like preparing the new marketing copy for the app store.”

Buxton’s approach to managing cross-team releases means that instead of constantly needing to check that everyone’s code is ready to go, each team can just prepare their own runway for their features. This cuts down on communication between teams, and avoids complicated team dynamics where one team impacts another team’s deadlines through last-minute bugs or fixes.

Limiting Team Complexity Is The Key to Scaling a Team

We started this journey talking about app development complexity, and thinking of it as a necessary evil. And that’s what it is. Each person you add to a team makes that team more complicated in large and small ways. Each new feature you add to an application has the same effect.

We hope that as you’ve walked through these different causes and approaches to development complexity, O’Connor and Buxton’s thoughts helped you shape your own approaches to complicated development lifecycles and see how you might tackle some problems.

Thanks for coming along with us, and if you have any questions, don’t hesitate to reach out!

llama looking directly at viewer

Get control of your mobile testing today!

Get started