Handbook 2

At Waldo we believe the goal of a proper testing strategy is higher quality builds. This is achieved through tackling testing as an engineering problem and building tests that are deterministic, isolated, and repeatable. Testing should help software engineers and speed up their work—not slow them down.

A proper testing strategy enables:

  1. Speed and quality
  2. Faster release cycles
  3. Higher quality builds
  4. Efficient test results
  5. Fewer bugs in production
  6. Better coordination between QA and engineering teams
01

Introduction

Pillars of testing

Pillar 1: Tests must be deterministic

Organizations will pay attention to data they can trust. Software test results are only trustworthy if they are predictable. Therefore, a given test input must always deliver the same output.

Pillar 2: Tests must be isolated

Users have a reasonable expectation that actions performed within their account do not affect the state or data of another user’s account. Therefore, multiple tests must be able to run at the same time without impacting any other test results.

Pillar 3: Tests must be repeatable

A test must be executable over and over again. The flow and output of the test is predictable and there’s no concern about running it several times and getting a different result.

State management is the path to determinism, isolation, and repeatability

Software must be predictable, scalable, and trustworthy to users and the people building it. These attributes rest upon each of the pillars of testing.

  • Determinism: Performing an action on the state of a user should always yield the same resulting state.
  • Isolation: No matter what else is being tested, the test must still provide the expected resulting state.
  • Repeatability: No matter how many times you execute a test and no matter what else is being tested, the test must still provide the expected resulting state every single time it’s executed.

The state of a single user is both the input and the output for a test. Predictable, scalable, and trustworthy software is not possible without effective management of that user’s state. Even in apps without a user login, state management allows a test to switch between devices and pick up where a user left off.

Provide the right architecture for automated tests

It is not reasonable to test an app with manual tests alone. Automated tests are a modern necessity, but we can only automate what is automatable. Deterministic, isolated, and repeatable tests rely upon automatable capabilities within an application.

The pillars of testing can only be proven through automated tests, not through non-scalable manual testing workloads, which leaves too much room for error.

Test attributes

Testing must be long-lived

Good methodologies stand the test of time. Therefore, the principles that guide software testing should last at least a decade. If not, they should at least live longer—much longer, in fact—than the application code itself.

If you are modifying your tests more than you are modifying your app, you are doing it wrong. Tests will evolve as the app evolves. Test principles will not.

Testing must be reliable

There is no point in designing, implementing, running, or maintaining tests that are not reliable. Unreliable tests are useless. Test results must be dependable, otherwise people won’t trust them.

Reliability means that, all things being equal and if the app doesn’t change, the test doesn’t report something is broken. Reliability cannot be left to hunches or inferences.

When people cannot trust tests:

  • They don’t know what’s working
  • They don’t know what’s broken
  • They don’t know what to fix
  • They don’t know where the problem starts or where it ends, if it even exists
  • They can blame anything or anyone for the test results
  • They shift app testing to end users
If a reliable test says, “The signup flow is broken,” it must mean that the signup flow is broken. It cannot be “because of some artifact somewhere.” The signup flow cannot be “broken despite finding no problems.”

Testing must be efficient

Engineers live for efficiency and want to spend as little time as possible on testing. To an engineer, efficiency and testing often “feel” orthogonal because:

  • Creating and maintaining tests is time consuming
  • Running the whole suite every time slows down development
  • Test creation and maintenance is faster as you build, not after

Testing needs to be fast and scalable with fewer test suites that run more often. Engineers should not write and test code only to realize later that the test suite is broken. If the automation can’t be 100% trusted, then a lot of time and effort has to be spent on code that doesn’t fulfill promises. That’s when people start dismissing test results (e.g.  “It’s the server, not the test or the app”).

For testing to be good, you need to do two things:

  1. Validate that the work passes all the tests
  2. Look back to the test results when you have doubts

If you think signup is having weird behavior, you want to look at the test and be confident. If the test needs an update, it should be straightforward and the results should be trustworthy.

Automated testing is a solved problem, but engineers regularly encounter test suites that take an unreasonable number of hours to execute. Even if the tests are manual, they shouldn’t slow the engineer.

Efficiency means: the test suite is functional, trustworthy, and speeds up the work of the engineer. Everything works, is easily maintained, and doesn’t require unnecessary effort from the wrong people.

Testing must be transparent

Test reliability and trustworthiness are dependent upon the transparency of the tests themselves. This is especially important for mobile apps because most product stakeholders can’t easily see flows the same way features or new screens can be viewed on desktop apps. Therefore, transparency is all about internal communication.

Full transparency means that any stakeholder in the company knows what’s being tested. They don’t need to know technical details about frameworks and code, but they should be able to understand the tests and why they’re important.

For example, an engineering manager may want to know about a new mobile app release. It’s likely they will want to check on what’s been tested in the new version before it goes to the app store. They may also want to be confident that all the important use cases are covered in functionality and testing.

If that engineering manager wants more information, can they see the test results specific to changes in the new version? They need to get something meaningful from the tests that gives an impression of what’s going on. In other words, they don’t want to see log lines and reports in stdout format because that’s only useful for developers. Stakeholders don’t want to dig through anything cryptic, uninteresting, or confusing. They want the results and the understanding.

Ideally, they want to see a replay of what happened. If there’s a new onboarding flow in the application, they need to see what it looks like on the device without having to install it themselves, know that it’s been tested, and see the outputs of the tests.

Errors and problems must be transparent

Stakeholders should be able to quickly know what is broken in a release. Whoever writes app-breaking code should be the first to know about it. There should be an easy way for that person to see the artifacts and know exactly what went wrong.

When new code changes come together for a new build, the automation must continue to work and tests should pass. If something breaks, someone needs to dig into those results to see what’s up. They’ll look for what’s broken, try to understand why, and look through the history of the dev cycle. It’s their job to know why a test failed.

Every developer should have visibility into every PR that went into the change. It should be apparent who broke the release because they should know before changes are merged into the main branch. The first person to know should be the person who introduced the problem. They should never find out from a QA engineer who spent an hour testing the merged release.

Tests and applications must be consistently decoupled

All too often, developers write tests that are based on the code or what the test automation engine does—not on what users do in the app. Devs implement features to be tested easily, but optimize for the wrong problem.. A good test should model what real users see, not what a non-public, “test” version of the app does.

Effective application testing is decoupled from the application and the code. It does what users do, not what computers do. Therefore, tests shouldn’t perform things that users don’t do, like artificially wait a number of seconds for a specific element to render.

If the user has no idea how the app was developed, then neither should the test.

Testing strategy

Testing is a shared responsibility

The best time to catch a bug and create or update a test is during the development cycle, not after. Developers should know immediately after writing code that their change created a bug. QA should not have to triage for hours and then assign a bug back to the developer.

By sharing the testing responsibilities, engineers can drastically improve the quality of the builds that are sent to QA, who can then test edge cases and various user states.

Testing has a social component

The testing and release processes inevitably bring people together. Testing at every stage and in every environment makes everyone’s work relevant and interdependent. Even before testing begins, development teams need to decide what is getting built and how it will be tested.

Automated testing can only test automatable functionality. None of this happens by accident. Several stakeholders need to decide how and what features will be automatable. The only people who can do this successfully are those with access to the code. Furthermore, only the people with access to the code have the ability to manage and repair issues before a release goes to production.

This reality creates a shared accountability between everyone involved. It’s not about blaming, it’s about accountability. Engineers and QA team members have several opportunities throughout the development cycle to address anything that comes up. Effective teams with reliable tests inevitably deliver higher quality software.

Automation is engineering

Most automation is trying to map human behavior into code, which is only appropriate in some cases. Different types of tests require different types of automation. Determining what types of automation are appropriate for which types of tests is engineering’s responsibility. It requires deep consideration, designing, implementation, and maintenance.

Engineering has constraints and requires repeatable flows. It is much more than turning human behavior into a script.

Don’t look down on automation as something that doesn’t fit the rules of good engineering.

You can’t automate everything

The completionist desire for 100% automation is not realistic, nor is it possible. There is no way to replicate all of the behaviors one could expect from real users of an app.

Companies need to think about what should be manually tested and what should happen with automation. Using resources in the right places is critical. QA is an important function that’s not being used appropriately

This is about the principle of efficiency: Trying to  automate everything is not efficient. Balancing automation and manual testing to allocate the right resources to the right problems is.

Automate the app, not the code

A lot of the traditional scripting and automation starts by finding a user interface element using an ID. But IDs often change, so relying on them creates a new concern. The script doesn’t know what to recognize anymore, even if a human can. Then the test breaks.

If you build tests that use the app the way a human does, there’s no need to tell the automation about an ID.

 Automating the code is fragile. Automating the app is robust.

Creating and maintaining tests should be easy

Making an app automatable is an engineering problem. You don’t want to be negligent; you want to be intentional. Intentional engineering stands the test of time.

Tests are only useful when they’re long-lived and independent. If you get it right at the beginning, you’ll do well for the next few years without changing anything and you can build good automation on top of it.

Testing best practices

Validate the quality of a test is by using it

There’s only one way to know tests work: by testing them. Testing tests requires evaluation, repetition, and verification.

Evaluation. Apps will change and tests will need to work. Does the test always give the same result despite the app changes? There’s only one way to know and it requires running the test through several months’ changes.

Repetition. A test should run 20 times and get the same response 20 times. No matter how the app changes, the test should provide the same results. This is the opposite of “flakiness.” If you run a test 20 times and get different responses, then it’s flaky.

Verification. Testing a test requires running it over and over until there’s certainty that the test is capturing functionality. The results of the test should be meaningful.

Don’t be exhaustive, just test the essence

Just as you can’t automate everything, you can’t test everything. 

If you have a form that changes a value in a user profile, just test one field and make sure the value sticks. If you update a setting, make sure the setting sticks. Testing every field and setting is usually a waste of time and a huge waste in the future.

Test the essence of what needs testing. Don’t test everything that can be tested.

Testing has layers of importance

Not all tests or use cases have the same importance. Test the most important functionality as frequently as possible and other functionality less frequently. Importance is determined by the importance to the user.

Cover your bases by running different tests at different times, particularly in your nightly, merged, and branch releases.

Have a set of canonical user flows

Apps are like trail systems. There’s the main trail that everyone uses with occasional forks and loops. Then there are the smaller trails off the beaten path.

Break down the experience of your app into a set of canonical user flows that can be tested across every version of the app. Track these flows in documentation accessible to anyone who changes or tests the app. Make sure every flow has a test for every version in which it’s relevant.

Use the right state for each test

New accounts have no baggage. Use this to your advantage when testing core functionality.

However, there are situations when using an account to test user scenarios with a specific state.

Be intentional about which type of user account you use to guarantee the right tests and results. Make sure you have the ability to create the right state for the scenario you need to test.

Know what you are trying to reproduce

End users do not construct, send, or receive data payloads. End users do not pause for 5,000 milliseconds before interacting with the app. End users do not interact with mock endpoints.

One of the most common testing problems is trying to get an app out of reality. Service mocking is a good example: “This is what you should see as a result of this request.” Mocking creates an artificially constrained ecosystem that doesn’t reflect reality. It shows you what worked for a design from 87 days ago. Assumptions change when software changes.

Testing should not reproduce data payloads. It should reproduce what users do and see in an app.

02

The Mobile Testing Pyramid

Fundamentals

Mobile expectations are rising

Everyone has a mobile device and mobile app experiences are improving rapidly. This means that users have higher expectations than ever. Low quality apps get deleted. Big breaks in applications mean big breaks in usage. Just look at mobile app reviews for proof. Plus, product bugs have a 24-hour minimum impact thanks to App Store and Google Play review cycles.

Despite higher expectations, users still enjoy novelty. Companies that move quickly can win new markets, but user forgiveness doesn’t last long. Consumers will still demand more impressive mobile experiences over time.

Companies are becoming more adaptive by adopting frameworks for native apps (e.g. ReactNative, Swift, Flutter). This presents a massive opportunity for mobile testing, particularly for automation, shift-left testing, and increased security. It’s also an opportunity for the industry to create and adopt standards and best practices, like accessibility.

But perhaps the biggest gap presented by mobile apps is end to end testing. Each framework presents new constraints on testing. Test engineers are no longer writing tests, but providing frameworks and tooling so other developers can write tests.

User expectations are clearly fueling major changes in the mobile app development and testing life cycles. But developers aren’t the only ones responding. Apple, Google, and app framework providers are moving quickly as well.

Mobile operating systems are opinionated

Mobile app developers face major challenges with the user experiences prescribed by iOS and Android. Google and Apple’s desires do not always align with what developers or users want. Engineers are often forced to work around these expectations, but aren’t always successful. They would rather build an app once and release it for multiple OSes, even if there are UI differences or issues on different platforms.

But users don’t care how an app is built. They only consume apps in one way. They have a device that has an App Store. If they launch the app and they encounter a blank white screen, they do not care whether the app was built in Swift or ReactNative.

How an app is built shouldn’t matter, but it actually matters a lot. Especially for developers and testers.

Testing is different for every device and framework. Each framework forces its own methodology, and some (especially iOS) intentionally get in the way of straightforward building and testing. For example, ReactNative provides a UI that doesn’t conform to Apple’s iOS specifications, which means you can’t easily automate, which means you can’t easily test. For another example, apps with WebViews or embedded web content aren’t supported by iOS, which severely limits what you can build, especially cross-platform.

Mobile testing is difficult to scale because these frameworks do not provide out-of-the-box ways of automating apps.

The Mobile Testing Pyramid
The mobile testing pyramid

Fundamentals of the Testing Pyramid

When exploring mobile testing, the testing pyramid is the de facto stopping point. It’s where everything should start. Each layer must be considered independently and there needs to be a conceptual understanding of what will happen in each layer. This is the only way to build long-lived testing.

Tests in every layer must work for months and for tens of thousands of executions. Testers should design tests to honor each layer of the pyramid and make a concerted effort to lower the number of false positives and negatives. This is the only way to force your tests to detect problems, because they’re useless if they’re sensitive to things that don’t matter.

By keeping each layer separated, you set the right expectations for what’s being tested. It prevents confusion about end-to-end testing versus manual testing. They are different, even if people mistakenly conflate them. You want automation in one layer and manual edge case testing in another.

For example, end-to-end testing should cover and automate the most important features and not do any defensive testing, like whether the user has a Yahoo email address. End-to-end testing covers the default use cases for customers, like whether they can find and purchase an item.

The best test strategies use every layer of the pyramid. Doing so creates confidence in the code that’s being shipped.

Automation in every layer

Automation is not just mapping human behavior to code. That’s not how testing works, which is why we have the pyramid.

Manual, non-automated testing is at the top of the pyramid. It’s best for edge case testing that wouldn’t be caught in the other layers. It’s also good for spot checking that everything is okay.

The rest of the pyramid, however, should be pure, idempotent automation. It should not map manual approaches to automation. Automation should be considered from an engineering perspective, looking at constraints and requirements for repeatable flows. It is not about taking manual testing behavior and turning that into a script.

Different tests for different reasons

We can’t test everything, we can’t automate everything, and we can’t treat all tests the same way. These are the reasons for the layers of the testing pyramid. Different tests are needed for different reasons.

Unit tests allow you to test exception cases, like sending a -1, a 0, a “0”, a large number, or something with a comma. It’s not for testing an API endpoint

Integration testing is for more complex test cases, like making sure you can’t use the same email address for two different users. This is where you can test (real) payloads and (real) API endpoints. This is also the layer to invest in if you want test coverage for issues like race conditions. Integration testing isn’t about covering all use cases. It’s about getting a general sense that systems are working together.

End-to-end testing is a complement to integration and unit testing. It verifies that canonical user flows actually work. Examples include whether a user is able to go through checkout, whether they receive an email about a successful order, and whether they get relevant confirmation messages.

End-to-end testing fits everything together and makes sure the right behaviors occur for a user. It covers the fact that you won’t completely break user behavior for the most popular actions. We say “most popular” because it’s not about comprehensive testing. It really means you are testing the system as a whole, putting the app under the conditions that are faced by real users. It really means you are talking to the back end instead of sending a meaningless payload to the back end.

Users are the enemy of good testing

There’s an old joke that “running a business would be great if it weren’t for the customers.” Well, the same can be said about applications: they’d be great if it weren’t for users.

Users, like people, are never the same. Developers cannot test N test cases for N users. Because we cannot be comprehensive and exhaustive with our testing, we must work with assumptions and constraints.

Users live in the back end

A “user” is a representation of an account, or a uniquely and exclusively coupled set of data, within a back end system. Users are usually a way to keep track of someone in your back end across the app experience. A user can exist whether the app is installed on a device or not. Also, a user can exist on multiple devices. Users keep track of data in a way that’s permanent and distinct.

Users are not deterministic, nor are they repeatable. Real users don’t honor the pillars of testing. This makes testing difficult.

Authentication challenges

Many apps require authentication, which requires a tester to consider how to go about testing for users and what kinds of users are needed for testing.

The answer is not complicated, unless the tester reuses the same users for multiple tests. It’s very easy to deconstruct and disrespect the pillars of testing when reusing accounts for multiple tests.

Generating a new user for every test gives testers a fresh account with a blank slate and nothing in common with another user.

Additionally, your users may not all live in the same country, which means that creating a user from France is different from creating a user from Canada. There are regulatory considerations for users in each country.  There’s no way to get successful automation for international users unless you tackle other problems first.

Mobile testing challenges

The challenges of desktop and mobile testing have some overlap, but the differences are non-trivial. 

Sometimes mobile apps are intertwined with the environment

Mobile devices are designed for portability, which means they have proximity sensors, position sensors, light sensors, awareness of the time of day, and they move all the time. Mobile apps can be affected or influenced by environmental characteristics. It is very difficult to test this well.

For example, if you are automating an action using an accelerometer, how do you consistently test that? What is the given input that produces the given output?

Mobile devices also have more inputs. Touches, gestures, multi-touch, physical orientation, motion, reactions to the environment, and more. There’s more to test than what’s on the screen. This is one of many challenges for automating mobile tests.

Mobile is more complicated to automate

Adding to the environmental and contextual realities are the mobile operating systems themselves. It’s not simple to say “do this, then do that” to a mobile device. These difficulties make any small automation defects worse for mobile testing because it is harder to differentiate what’s happening in the app from what’s happening on the device.

For example, it’s easy to open a web page in an incognito window. On mobile, however, more information is captured on the device. The app might recognize the device ID and there may be links or dependencies (like cloud storage) to that unique ID. The browser may pull even more data about the device, which makes determinism much harder to accomplish.

The concepts are the same between desktop and mobile, but defects will show more blatantly on mobile.

03

Automated testing

Building a proper testing strategy

A proper test strategy will be a marriage of good testing principles, best practices, and organizational culture. A testing strategy does not lead inevitably to good testing. Proper test strategies must be properly executed. One of the best ways to ensure proper execution is to keep the test strategy simple, realistic, and achievable.

A major hurdle to good test strategy and execution is complexity. Don’t test everything, don’t automate everything, don’t cover 100% of your code, and don’t expect perfection.

Why you need a proper testing strategy

Deterministic, isolated, and repeatable tests require design and intentional work. You must think ahead and solve for what makes your app deterministic and repeatable inside the testing environment. Otherwise, there’s no point in pursuing automation.

The goal of a proper testing strategy is not to build a suite of tests that align with the testing pyramid. The goal of a proper testing strategy is higher quality builds. High-quality software reduces waste of expensive and time-strapped engineering resources. It improves collaboration across teams, which accelerates deployment cycles and squashes bugs along the way. Apps that are well built and well tested almost never experience critical crashes.

There are second order consequences of a well-executed proper testing strategy, including:

  • Better management of unplanned work
  • Improved mean time to resolution for incidents and outages
  • Quick and easy identification of which code caused test failures 
  • Higher scores in app review metrics
  • Increased customer satisfaction

Keep your tests simple

The number one problem with automated testing, especially on mobile, is things get complicated, polluted, and noisy fast. The symptoms of this are flaky tests, long test suite execution times, test instability, and unreliable results.

There are only two reasons for this:

  1. Too much complexity
  2. Doing things incorrectly.

Tests should continue to function as an app evolves and changes

Apps change. Some apps change more frequently than others, but all apps change over time. Unfortunately, QA people try to script what’s happening in real life without realizing that in real life there are constant adjustments.

Testing an app requires using it and performing actions. Using and acting changes how the app behaves and evolves. When testing begins the next day, they start where they left off and have to change what they’re doing. For example, apps with inboxes can have messages pile up, which requires scrolling, which means the UI changed to allow for scrolling.

Humans understand this intuitively. Automation does not, and building that awareness into automation is not easy.

Tests are the gatekeepers of the quality of the app

Testing allows you to know in advance if your users are going to have an impaired app experience. There must be a proxy for end users performing actions in the app. You can test to know whether the app will behave in the normal, expected ways.

End-to-end testing should get as close as possible to what the end users will do and extract data from real usage of the app. Look for functionalities that can be tested easily to ensure the user experience doesn’t break, and don’t spend too much time finding edge cases.

For inboxes, strategically determine whether there should be tests for “send first message” and “send second message.” Consider the simple behavior of sending messages, which is one test. If testing goes well and there’s transparency about the test, stakeholders can decide to approve the release. Until then, the tests are the gatekeepers. Transparency will make clear what’s failing.

Test automation should fit the rules of good engineering

Don’t look down on automation as something that doesn’t fit the rules of good engineering. When you are thinking conceptually about automation, you should apply the same thinking as you do for everything else in the app, like your infrastructure, architecture, system integration, data modeling, etc.

When you code, you want to extract modules and shared logic, then write it in one place so that doing an upload, no matter where, always runs the same code. Break down the experience of the app in a set of canonical user flows and build solid automation around them.

How to automate an app

You can only automate what is automatable. Designing for automatability is an engineering task that requires collaborative decision making between developers, designers, and testers. It cannot be done in a silo. Automatability needs to be a long-lived attribute of an app’s design.

Design your app to be automatable

Making your app automatable is the first step for E2E testing. App features and capabilities must be specifically designed for repeatability. Changes in the UI need to be verifiably functional, testable, and predictable. The best way to do this is by designing methods for controlling the application and the user state.

Developers can build their apps with additional configuration layers (e.g. a “super admin panel”) accessible via a “test mode.” When using new accounts for every test execution, it’s trivial to make an account an admin, reset an account’s data, or import data to test a specific scenario. The super admin panel can be laid on top of the base application (not as part of the retail app’s functionality) and can explore different flows, features, and variations without external dependencies.

Building configuration layers allows the state of an app to be deterministic and controllable. They also enable an app to be automatable. But it’s not just the app that needs automatability. The back end also needs to support automation.

Because back ends are custom-built for apps, no two back end systems are the same. Therefore, it takes careful design and intention to ensure the back end honors requirements for successful testing via the app’s front end.

Automatability must be long-lived

For automation to succeed in the long-term, automatability must be a foundational part of the application’s design. This is especially true as the app evolves through feature upgrades, enhancements, and maintenance. When the app changes, it should continue to honor automated testing success.

When changing the app experience, make sure you are able to test it with specific users or with a toggle. Don’t create settings deep in the application code or with feature flags or you’ll destroy yourself. Feature flags and deep application code remove predictability of test results. 

Automatability designed for simplicity will have a lower chance of failing. When testing gets complicated, you are validating a moving target that’s evolving and doesn’t break in predictable ways. Anything that’s too brittle and not related to your test is going to cause lots of problems. Make sure the app delivers consistent results for the same actions and your automation will be long-lived.

Automatability must honor the pillars of automated testing

Successful automated testing is only possible if it honors the pillars of determinism, isolation, and repeatability. Therefore, you must ensure your app can run in a way that gets the same output given the same inputs. It cannot be muddied by unpredictability via feature flags or A/B testing. Even the communication with the back end must be deterministic.

Some testers put too much emphasis on the automation engine. Instead, they should focus on what is automatable in the app. The more effort gets focused on automatability, the better the automation gets, and the more opportunities for automation become apparent.

For edge case testing, like “breaking news” or A/B testing, ensure there are ways of recreating the states in which those conditions occur. Without effective state management over these conditional scenarios, tests will fail on determinism and repeatability.

Don’t force automation to be full of conditions or dependent upon dynamic app content. Tests should not be written as `if (breakingNews()) { }`. If breaking news isn’t part of the normal app flow, then change the test. If an app has a different UI in the morning than the evening, should the tests run at different hours of the day? No. Instead, specify the time of day as an input to the test.

Test that your app is automatable

You can’t validate your tests without testing automatability. Here’s how to test for automatability:

  • Make a test mode version of the app and know exactly what you’ll get for any flow
  • Ensure you always get the same results
  • Give yourself a way to test app behaviors while you are in a mode that requires it
  • Build the automation
  • Your automation should be able to do anything you can do manually
  • Automation repeats actions over and over

Even if you don’t get around to building automation, making the app automatable will make testing at least ten times easier, even if it’s just for manual tests. And remember, manual testing can be done by anyone, not just testing engineers.

When it comes to automation, “how” versus “what” matters

How you automate should not rely on any implementation details of the app. What you automate should not rely on the automation engine you use to test the app. Yet, many testers do both.

Instead, consider the automation engine the “how” and the test itself as the “what.” Both the how and the what need to be deterministic and reliable—regardless of changes in the app.

A good automation engine doesn’t rely on how your app has been implemented. Automation that relies on what the user sees is a good way to build an engine. A good automatable app makes sure that it’s giving the same output based on the same inputs.

Verify what the user sees

The first question to ask when writing a test is, “What am I verifying that my users see?” This is essentially two questions:

  1. What does my user see?
  2. What do I want to verify that they should see?

Understand how to verify what your user sees in a way that doesn’t break whenever the app changes. 

Go back to the behavior of the app without mimicking the implementation of the app. The implementation of the app is a poor proxy of how the app behaves, so don’t assert how implementation details. Focus only on how the app behaves.

For example, updating a profile picture. If you want to verify that a user can upload a profile picture, the story should be: “I can do a bunch of checks that get me to the point where my profile picture can be replaced by another image.”

It should not be: “When I want to change my profile picture, I must encounter elements with specific attributes.” That’s not relevant at all.

It may seem easier to write an automated test based on the application code, but it’s not helpful. Don’t fall into the trap of “I write tests in code, therefore I should test code.” Software within your software doesn’t provide value.

Treat automation code like application source code

There are plenty of compromises a developer would never make when building a back end service, so don’t allow yourself to make those compromises when writing automation. Your code won’t work well if your automation isn’t written well. Testing is as important as the coding of the app. If you don’t do it right, it will take a lot of extra time.

Whatever you write needs to work for a long time. Just because users don’t run tests doesn’t mean you should treat your test code as sub-production.

Instead, write 5 tests and run them for months. Get good, trustworthy, consistent results and make sure your automation continues to work as your app changes. Use this time to develop a discipline of automation and testing.

The only way to see if a test is successful is if it outlives changes to the app. Our minimum criteria for success is 6 months and 10,000 executions. Anything else is noise.

If your app isn’t automatable, testing won’t work

Reliable automation requires discipline and rigor. You can’t slack off and you need to do your homework. No one can help you develop this discipline, so it’s entirely your responsibility. No automation engine can automate your app—it needs to be automatable by design.

Let’s consider dashboard apps as an example. If the app is showing real-time values, like the value of Bitcoin in US dollars, you’ll have weird use cases. This is where there is beauty in testing because it gets into real engineering problems. You’ll need to consider trade-offs in your testing and approach it like any other engineering problem: with rigor and acknowledging it’s not simple

When companies don’t approach automated testing with discipline and rigor, they put together a halfhearted solution that will fail in just a few weeks. It looks like this:

  • Someone gets excited about automation and thinks they’ll get somewhere
  • They write 50 cases and pat themselves on the back
  • The tests work for a week, people are excited
  • 15 tests fail two weeks later, when the next release is completed
  • They tester starts skipping tests and excusing the exceptions (“There’s a problem with the back end”)
  • They tell themselves, “I still have 35 tests working fine.”
  • Two weeks later, another 15 tests fail
  • A regression is introduced into production
  • The process starts all over again

Automatable apps enable long-lived tests. Automated tests may last for a version or two of the app, but will fail. The app must support long-lived automated testing, which starts in the design process.

How to automate tests

Only when the app is automatable can long-lived automated tests be written.

What makes a successful automated test?

Successful automated tests are deterministic, repeatable, isolated, and able to run concurrently. There’s no such thing as a good test that doesn’t meet that criteria.

Good automated tests have additional attributes, like speed and quality, which are not easily attained. Writing and getting good automation can be difficult. Many people try going around the principles of good testing because they don’t have power to change the app itself. They’ll say, “The app is the way it is, so I’ll try to automate something that’s not automatable.” And they get flaky tests.

QA teams will sometimes brute force automated testing because they can’t touch the code of the application. To them, the app is a black box they can’t change. They’ll say, “I’m going to write a test smart enough to know whether it’s A or B.” Without knowledge or responsibility over the code itself, it’s difficult to know what is repeatable, deterministic, and isolated. Even if it’s a simple engineering task to make something automatable, it needs to be a shared decision or the testing will fail.

Know what you want to automate

Before automating a single test, it’s essential to know what you want to automate. For end-to-end testing, the goal is to automate what users do in the app. Begin by mapping expectations and goals from the end-to-end tests. Cover the core business journeys inside the app and know when to defend or prove functionality.

It’s also important to know if what you are testing is automatable. This is a common pitfall for many testers because they don’t identify all the places that cause bad scenarios, unreliability, or noise. Don’t build tests where you don’t know what to expect (e.g. A/B testing). If you don’t know which experience you are getting, you cannot automate it.

Automate the user experience, not the code

Automating the user experience starts with the app in its current state, but automating what users do and not how the code works. Most traditional scripting and automation involves the use of an element’s ID, therefore testers write tests that use IDs. But users can change or influence specific IDs and the script doesn’t know what to recognize. This is a bad testing practice because small changes to the code (not the functionality) can adversely affect the automation.

Tests that are code-centric, rather than experience-centric, use artificial automation capabilities that don’t reflect real-world usage. Automated tests often include wait times and pauses when the user doesn’t have to wait, and these tests fail when the wait period is too short. Or, they’ll include payloads despite the fact that users don’t generate payloads in the app. Such practices have nothing to do with user experience and are designed around an automation engine.

Loosely coupled tests and development

One of the most important aspects of automated testing is the decoupling of the test from the application. No matter what you are testing, you need to follow what the app is doing on the device—not what the code looks like. In other words, tests should not have wait times, pauses, checks for loaded elements, names of buttons, or payloads.

Instead, tests should know whether there’s a new screen in the user signup flow, or a new TOS checkbox to click. If the change gets reverted, the test should revert to the previous behavior. You want a world where the implementation of automation and the app are completely separate. In technical terms, you want one-way communication between implementation of the app and implementation of the automation.

In an ideal world, your automation is aware of where you are in the changes and development of the app. It should know that for version N, there’s a version N of the automation to execute. And if you are automating for version N+1, it should know that and take care of it.

If there’s no need to tell the users, there’s no need to tell the automation because it should do what a user does. If the user doesn’t see a difference, the automation shouldn’t either. For changes that users do see, keep the automation aware of different versions.

Loosely coupled means that ideally you have the same versions for your automation and the app. If your app is at version 7, the automation should be at version 7. But most people build an app and their test automation side by side, which makes it too tightly coupled. If someone changes the name of a button, you have to update the script.

You can’t be completely decoupled, either. If you’ve completely isolated your automation and your app, you’ll have different problems. The automation needs enough awareness of the app and its context to make the right decisions. Just as a user won’t use your app with zero knowledge, your automation shouldn’t either.

Lastly, don’t design tests so they’re specific to technical circumstances, otherwise you’ll keep getting failed tests because an implementation artifact has changed. Worse, you’ll break your automation with something users don’t know or care about.

Test the system as a whole, not individual parts

If your users aren’t limited to individual parts of the app, neither should your automation. Always test under the same conditions as your users. Here are various ways to test the system instead of parts:

  • If there’s a back end, talk to the back end through the app, not a payload
  • Test user-driven interactions, not API endpoint interactions
  • End to end doesn’t care about what goes in and out, it cares about the side effects
  • An app sends data, reads it, and shows information on the front end, so test that
  • You don’t care if a signup happens with one, two, or fifty requests, so don’t worry about it

Remember, you are testing the system as a whole by looking at its byproduct. It’s not an implementation detail of the UI. What you care about is being sent to a welcome page with the user’s name on it or that a “thank you” message is displayed after checkout.