How to Measure AI Impact in Engineering Teams

This is how you should approach measuring AI impact, and these are the metrics to track!

Gregor Ojstersek

and

Laura Tacho

Jul 06, 2025

Intro

“AI can improve Software Development productivity by 100x.” → This is one of the hype-creating statements that we hear a lot in our industry.

But nobody really knows what 100x really means and how to actually measure that.

They just mention some artificial numbers, based on their “feeling” and are not backed by any data. This is the reason why so many engineering leaders perceive AI in a negative way.

I’ve spoken to many different engineering leaders recently and in most cases, I’ve heard the productivity increase spanning from 0.3 - 1x increase.

But, how can we actually measure the productivity increase of AI tools?

Lucky for us, we have Laura Tacho with us today as a guest author. She’ll be sharing how we can approach measuring the impact of AI tools and which metrics to track.

P.S. This is our second collab with Laura, you can check the first one here: Setting Effective Targets for Developer Productivity Metrics in the Age of Gen AI.

Introducing Laura Tacho

Laura Tacho is the CTO at DX, the developer intelligence platform designed by leading researchers. She is one of the people that I think highly of when it comes to measuring developer productivity.

Laura and DX CEO Abi Noda will be hosting a live discussion on July 17 to dive deeper into the AI Measurement Framework covered in today’s article.

They’ll walk through the key metrics for measuring the impact of AI code assistants and agents, and share data from DX on how organizations are adopting these tools and what impact they’re actually seeing.

AI tools are rapidly entering engineering workflows

From Copilot to Cursor, ChatGPT, and Claude, developers are increasingly turning to these tools for code generation, debugging, and implementation support. Adoption is no longer the question → the real challenge now is understanding impact.

Leaders are under pressure to justify investments, guide adoption, and stay competitive. But the metrics available today are often shallow or misused.

We’ve all seen the headlines: “AI is writing 50% of our code.” On the surface, those numbers are impressive. But under the hood, they often count suggestions accepted, not whether the code was modified, deleted, or ever made it to production.

For leaders trying to benchmark themselves against those claims, the reality is that they’re largely meaningless.

At the same time, teams on the ground are eager for direction. They want to know where to invest time, which workflows to prioritize, and how to level up their usage.

Without reliable data, teams are left flying blind and unsure whether AI tools are helping, hurting, or simply adding noise.

This disconnect is creating a measurement gap.

AI tooling is evolving fast, but our frameworks for understanding and managing their impact haven’t kept pace. That’s why organisations need a structured, research-backed approach for measuring AI adoption, not just for reporting purposes, but to guide better decisions.

The goal isn’t just to track usage but also to understand how AI is transforming work, where it’s creating leverage, and where it might be introducing risk. That starts with using the right metrics.

What metrics should we use to measure AI’s impact?

When it comes to measuring the impact of AI tools, many organisations still rely solely on surface-level metrics, such as tool usage, acceptance rate, or lines of code generated.

While these are easy to track and are insightful, they only capture a narrow picture of the full impact these tools are having.

To get a more complete understanding, we recommend using the research-based metrics that make up the DX AI Measurement Framework.

This framework is designed to capture AI’s impact across three key dimensions: utilization, impact, and cost.

Utilization: Are developers using the tools? Which tools and how often? Measuring utilization is foundational, but it should go beyond license counts. Important metrics include:

Weekly and daily active users
Percentage of PRs that are AI-assisted
Percentage of merged code that is AI-generated
Tasks assigned to autonomous AI agents

Impact: Are the tools delivering real value? Are they improving workflows or just adding overhead? We recommend tracking:

Direct impact: AI-driven time savings, developer satisfaction
Indirect impact: Improvements in metrics like PR throughput, Developer Experience Index, or Perceived Rate of Delivery, captured through regression or longitudinal analysis

Cost: Is the organisation getting a positive return on its investment? What high-value use cases exist that we should be replicating? Key metrics include:

AI spend (total, and per developer)
Net time gain per developer
Agent hourly rate (HEH / AI spend)

These three dimensions align closely with the typical lifecycle of AI adoption. Early on, utilization helps identify uptake and promising use cases. As usage matures, impact becomes the central focus. Eventually, organisations must start managing cost to drive ROI and enforce governance and standardization.

By anchoring AI measurement in these dimensions, leaders can cut through the noise and get a clearer picture of where AI is driving real results.

How should we measure the impact of autonomous agents?

One of the most thought-provoking challenges in the AI era is how to measure the impact of autonomous agents.

Should autonomous agents be measured like developers? Our view is no, not exactly.

While it’s tempting to treat agents as standalone contributors, this misses an important nuance: most agents operate under the direction of a human engineer.

That engineer selects the use case, tunes the inputs, and validates the outputs. In that sense, agents function more like accelerators for the developers and teams overseeing their work.

We recommend measuring agents as part of the team that oversees them.

For example, when assessing a team's PR throughput, it’s important to include both human-authored PRs and those authored by agents operating under that team’s direction. It’s just like any other automation or tool that we would attribute to a team’s total performance.

This reflects a broader shift we anticipate: every developer will increasingly operate as a “lead” for a team of AI agents, and the skills of the human operator will be an important factor.

Developers will increasingly be measured similar to how managers are measured today, based on the performance of their teams.

Of course, this area of AI is evolving rapidly, with both the tooling ecosystem and the ability of models to successfully solve problems growing each day. We expect our recommendations to also evolve as we continue ongoing research in these areas.

How do we actually capture these metrics?

Understanding and measuring developer productivity has always been a difficult problem, and adding early-maturity AI tools with limited telemetry can make it even more difficult.

By using a mixed-methods approach, organisations can successfully measure AI tools across the categories of utilization, impact, and cost.

This approach can be implemented using a platform like DX or by building in-house solutions that capture the types of data listed below.

Telemetry data (Data from tools)

Data coming directly from workflow tools, like source control, ticketing systems, and the AI tool itself, provides insights into activity and efficiency, allowing teams to track trends over time.

Measurements that were valuable to your teams before AI will continue to be important; for example, tracking PR throughput and quality.

Additionally, we can look at telemetry data to get some measurements specific to AI tools, such as usage activity.

For example, GitHub Copilot’s team usage API returns information about engaged users and how they engaged with Copilot (in the IDE, chat, etc).

This automatically collected data is good for showing the impact of AI tools on team processes over time.

However, telemetry metrics can’t answer some important questions, such as:

How much of developers’ time is actually being saved thanks to GenAI tools?
How are developers using these tools?
What are the most beneficial use cases for GenAI tools that can be taught to the rest of the developers?

Periodic surveys

Data coming directly from developers remains the most pragmatic and fastest way to collect data about AI usage.

For some critical measurements like change confidence or developer satisfaction with AI tooling, self-reported data is currently the only available mechanism for data collection because the measurement depends on a human’s participation.

Other things are just faster to collect via self-reported data because of current telemetry limitations.

Most organisations capture self-reported data on a regular, quarterly, or biannual cadence through an org-wide survey. Typically, these surveys ask developers about factors impacting productivity, and can also include questions about their use of AI tools.

The results provide insight not just into how developers are using AI and how often, but also about the relationships between AI use and other factors like code maintainability and change confidence.

Self-reported data captures the “human” side of tool usage that telemetry misses, but survey results depend heavily on good design and consistent participation, making them best suited for periodic snapshots rather than real-time monitoring.

Experience sampling

Experience sampling is a method for capturing real-time feedback from developers while they're working.

Here's how it works: You pick a specific moment in a developer's workflow, like when they open a pull request, and ask a small, random group of developers a few quick questions about what they just did. The closer you get to the actual work, the more accurate and useful the feedback becomes.

This approach differs from regular surveys in several important ways.

Quarterly surveys give you higher-level, big-picture trends over time. Experience sampling, on the other hand, focuses on one specific task, asks just a few targeted questions, and runs for a shorter period, typically a few weeks rather than months.

Experience sampling is a necessary complement to telemetry data. The workflow data might show that a user interacted with the tool, but experience sampling explains how—e.g., "I used it to generate boilerplate code," or "It helped refactor this component." These snippets uncover the use cases that actually drive value, which helps you scale best practices across teams .

A downside of experience sampling is that it can be difficult to orchestrate. If your teams don’t use a tool like DX’s PlatformX, it can be time-intensive and costly to build out an experience sampling system.

Advice for rolling out AI metrics

As with any measurement effort, leaders must be intentional about how metrics are introduced and communicated. Measuring developer activity, especially in the context of AI, can be a sensitive topic.

The hype surrounding AI, combined with the growing telemetry surfaced by AI tools, has only intensified the pressure teams feel.

Start small, get a baseline, and communicate proactively

We recommend starting with a few teams and taking a learn-and-scale approach:

Baseline first. Measure your existing engineering productivity and developer experience before introducing new tools. This gives you a point of comparison.
Start with team-level visibility. Avoid using metrics for individual performance evaluation. Metrics like code generation volume are particularly susceptible to gaming. Encouraging behavior that optimizes for the metric, rather than the outcome, risks malicious compliance, undermining team trust and rendering the data meaningless.
Share early findings. Use initial results to spark discussion, not drive mandates. Teams are more receptive when they’re part of the learning process.

One of the most common missteps is under-communicating why these metrics exist and how they’ll be used. In the absence of clarity, fear and speculation take over. Your messaging should emphasize:

These metrics will not be used for individual evaluation
The goal is to understand how AI is changing workflows and team dynamics
The data will inform investments, support, and enablement, not punitive action

Reinforce these messages early and often.

Make sure your internal champions, team leads, and enablement partners are aligned and able to answer questions from their teams.

Ground leaders’ expectations by communicating real impact

Collecting metrics is only useful if they drive clarity and alignment. To secure continued investment and support, you need to communicate AI’s impact in a way that speaks directly to business priorities.

Here’s how to do that effectively:

Start with the problem, not the tool. Anchor your message in the specific challenges AI is helping address, whether that’s reducing time to market, accelerating onboarding, or improving delivery consistency. Metrics should illustrate how those problems are being solved.
Tailor metrics to your audience. Different stakeholders care about different outcomes. For example:
- Executives: Business velocity, security posture, innovation capacity, ROI
- Product leaders: Cycle time, product quality, experimentation throughput
- Finance: Cost efficiency, tooling ROI, resource leverage
- Developers: Time saved, reduced friction, increased autonomy
Focus on clarity, not spin. The goal is not to hype AI, but to show how it’s being used, where it’s delivering value, and where further investment or support may be needed.

The world of software has already changed with AI, and engineering leaders have a responsibility to measure that change in order to effectively guide and implement the company’s investments.

Getting this right means moving beyond surface-level metrics. It means capturing the full picture: how tools are used, what value they create, and at what cost.

And it means measuring with intent, with the goal to accelerate teams, inform enablement, and identify the workflows where AI is having the biggest impact.

This is a fast-moving space. At the end of the day, the question isn’t whether AI will reshape how software is built. It’s whether we’ll be ready to measure and manage that transformation.

Last words

Special thanks to Laura for sharing her insights on this very important topic with us! Make sure to check her out on LinkedIn and also check out the live webinar on Measuring AI code assistants and agents with the AI Measurement Framework.

We are not over yet!

How Personal Branding Can Supercharge Your Engineering Career

Check out my latest video. No matter if you are an engineer or a manager, you should be focusing on your personal brand. It’s the first impression that people have about you and it's very important that you leave a good one. Learn more below:

New video every Sunday. Subscribe to not miss it here:

Subscribe to the channel!

Liked this article? Make sure to 💙 click the like button.

Feedback or addition? Make sure to 💬 comment.

Know someone that would find this helpful? Make sure to 🔁 share this post.

Whenever you are ready, here is how I can help you further

Join the Cohort course Senior Engineer to Lead: Grow and thrive in the role here.
Interested in sponsoring this newsletter? Check the sponsorship options here.
Take a look at the cool swag in the Engineering Leadership Store here.
Want to work with me? You can see all the options here.

Get in touch

You can find me on LinkedIn, X, YouTube, Bluesky, Instagram or Threads.

If you wish to make a request on particular topic you would like to read, you can send me an email to info@gregorojstersek.com.

This newsletter is funded by paid subscriptions from readers like yourself.

If you aren’t already, consider becoming a paid subscriber to receive the full experience!

Check the benefits of the paid plan

You are more than welcome to find whatever interests you here and try it out in your particular case. Let me know how it went! Topics are normally about all things engineering related, leadership, management, developing scalable products, building teams etc.

A guest post by

Laura Tacho

CTO @ DX

Pawel Brodzinski

Jul 7

Aren't we overdoing it? I mean the measurement? Sure, we can measure the utilization of the AI tools, how much code gets generated, etc.

Risking controversy: it's largely useless.

Measuring utilization was a red herring, even when we were measuring human work. Ask any Lean Management or Theory of Constraints folks, and they (we?) will rant about it as long as you want.

Generated code would only be interesting if the act of generation were unassisted. If a developer spent an hour writing the production-ready code and another developer spent the same hour delivering the same code but used the time to prompt, re-prompt, and review code, is there really a difference?

And it just so happens that we have (or should have) a good compound metric that allows us to integrate the vast majority of these nuances into one dimension.

*How much value are we delivering over time as compared to when we haven't used AI tools?*

In that aspect, it may actually be interesting to measure tool utilization one way or another to have a reference dimension.

But that's it. How much more value are we delivering?

And for the companies that are clueless about the actual value they create, they may use a less useful, albeit much easier to answer, question:

*How has our big-picture throughput changed?*

In other words, how many more value-adding features are we delivering?

Because, with the AI tools, I can easily optimize *some* aspects of my work by 100%. However, if it creates more work for others down the line, the aggregated gain may not be nearly as impressive.

A simple example is generating a large chunk of code quickly and shifting the cognitive load of ensuring it works well and doesn't break anything to a person conducting a code review. I just got super-fast. And the fact that the team delivers at the same (or slower) pace, well, who cares?

So, you want to know how much better your new shiny AI tools made you? Ask product people.

Expand full comment

2 replies by Gregor Ojstersek and others

Marcos F. Lobo 🗻🧭

This is a good introduction to how to measure the adoption of AI. At the end of the day, we will have to rely still in surveys, I think.

Looking forward to read more options.

CEOs ask their engineering leaders: what is the percentage of usage of AI in my company? Mainly, because they need that number to attract investors... a.k.a money.

1 reply by Gregor Ojstersek

3 more comments...

Engineering Leadership