Quality vs Quantity: Why AI-generated test cases is a mostly terrible Idea

Every week there is another story about AI taking jobs and simplifying the workflow for thousands of businesses, and a quick browse of any forum for testers will seem to indicate they are currently on the chopping block.

It makes sense – to many outsiders and even insiders, testers just get in the way. They spend weeks and months writing and executing tests, always complaining about product quality, a lack of time and the speed of bug fixes, while actually delaying or blocking releases themselves. And then once a release is out, most customers seem happy and only a few complain about the issues the testers raised, although there are those times when the testers slip-up yet again and miss something. Why don’t they spend more time testing?

It seems quite obvious that using AI to generate test cases is an easy solution for whatever the actual problem is. It will be fast, slick and everyone will be very happy to see 200 tests created in minutes, using as input that one Jira ticket written by the Sales Manager, with some additional notes from the Product Manager, plus the back-and-forth in the ticket comments section between the Architect and the Senior Engineer discussing the best solution and possible impacted functionality.

With this crystal clear input, plus a list of test types and categories from the ISTQB (functional, non-functional, regression, smoke, exploratory, boundary, equivalence partitioning, state transition, error guessing, security, performance, usability, compatibility… you get the idea) a prompt like this can be fed into Google Gemini or ChatGPT:

“Generate test scenarios for Jira ticket ABC-231 across these categories. Include positive/negative, boundary values, error conditions, state transitions, security, performance, usability.”

And that’s it! You’ll have a mountain of scenarios at your finger-tips. And if you have unlimited time, unlimited people, unlimited budget, and no competitors, you can do this for every feature and it will probably be perfectly fine. Probably.

But for the rest of the world…

…life is not so simple. They have constraints. Most teams operate in environments filled with comments like:

“You will get the release candidate Wednesday and testing has to be finished by Thursday 17:00 so we can deploy on Friday!”
“We don’t need a dedicated tester. Test your own work!”
“We can’t afford to test everything.”
“Our competitor shipped a similar feature yesterday. Work faster!”

In the rush to force AI into the testing arena, many seem to have forgotten what testing is all about. Testing is simply a form of risk management, with an emphasis on management. The desire should be to test the right things, which rarely means everything.

It’s all about risk

An effective test strategy that helps improve product quality will ensure that the right tests are executed on the right functionality at the right time. Two well-written tests covering a critical workflow will do more for product quality and customer satisfaction than twenty tests for an edge case that is technically possible but practically irrelevant.

Too many teams and engineering departments have rushed to implement AI tools without having a clear picture of the problems they are attempting to solve. AI tools can create hundreds or thousands of tests in minutes, but those tests will still need to be reviewed, implemented, executed and the results analysed. This still takes time, and just because you are testing more, it does not mean you are testing more effectively.

Before signing up to yet another AI product, ask yourself and your engineers to detail the issue they are hoping to solve. What is the exact problem with your current approach to test case identification and creation, that leads you to believe that AI is the ideal solution?

Why AI-generated tests are often irrelevant

What many do not realise is that without an understanding of your context, AI crafted tests can actually create more work, not less. Consider that the following could be partially or completely missing from all of your tests:

What is critical to the business
Which users matter most
What failure looks like in production
What has changed recently
What is likely to break
What would cause the most user pain (or refunds)

Without the context, an AI model can generate thousands of tests and most will be technically valid, but strategically they are almost useless. You end up with:

Lots of permutations
Little clarity
And a false sense of “coverage”

An engineer will need to spend time and effort reviewing hundreds, possibly thousands of tests to insert this context, time that may have been better spent simply writing the tests from scratch themselves.

The missing ingredient: asking the right questions

This is where the Q11 Method comes into its own. Risk cannot be understood and balanced without context, and context comes from asking the right questions. Answering the Q11 questions forces clarity around:

Correct vs incorrect usage
Permissions on who can/can’t do it
Pre-conditions, states and timing dependencies
End to end system behaviour and dependencies
Minimum and maximum system behaviour and limitations

With these answers you will be able to easily identify what will have the largest impact if it fails, what is most valuable to users and most importantly, what should be tested first.

The ideal workflow: Q11 → then AI

We are biased, but we believe that the key to getting the best out of your chosen AI tool is to start with the Q11 Method. With our own Q11 AI this is baked right into the user flow, as you can provide your answers to the Q11 while ordering your test plan package. Q11 AI utilises the inputs submitted for the Q11 and applies a custom algorithm based on our 20+ years of testing experience, to produce a highly targeted selection of test scenarios.

Q11 AI is also a great solution for Founders and Product Managers in the pre-development stage, trying to identify or detail potential functionality for a new product.

Regardless of which AI tool you use, the most important thing is to use AI solutions as power tools, not autopilots. Here is a simple tip and an AI prompt to get the most out of the Q11 Method and AI:

Run the Q11 Method on the feature to detail the context + risk cues
Feed the Q11 answers into ChatGPT/Gemini and ask:
- “Generate the top 10 test scenarios that cover the highest risk areas based on this context.”
- “Group them into: must-run (Tier 1), should-run (Tier 2), nice-to-have (Tier 3).”
- “Highlight the 3 scenarios most likely to identify user-visible failures.”

It will not be perfect, but now you are starting to get quality from the AI, not just quantity. And if you want to go one step further and get fully targeted scenarios, a detailed plan and more, then use Q11 AI to take things to the next level.

The Punchline

Captain Picard, “Make it so.”

If your goal is just a big bucket of tests, AI can definitely make it. But the goal of testing should be product quality, and that is still a hands-on job.

Learn more about the Q11 Method

Let Q11 AI do the work for you