What the Top 1% of Engineering Teams Do Differently with AI
Full recording and insights from the talk at the Engineering Leadership LIVE event in San Francisco.
This week’s newsletter is sponsored by Larridin, an AI-native developer intelligence platform.
How much should you spend on AI coding tools?
Larridin analyzed data from hundreds of engineering organizations to benchmark AI coding tool costs and find the budgeting approach that maximizes productivity without Tokenmaxxing.
Inside the research paper:
Understand where AI coding budgets are headed
What high-performing teams spend
The formula to determine how much your team should invest
Thanks to Larridin for sponsoring this newsletter, let’s get back to this week’s thought!
Intro
Last month, together with my friends from Augment Code, we hosted an event called: Engineering Leadership LIVE in San Francisco. It was a blast, and there were so many insightful discussions we had!
As part of the event, we also had 4 talks.
- Gregor Ojstersek, CTO & Author, Engineering Leadership newsletter
Talk: AI-Native Engineering Leadership
You can also watch the full recording of my talk here:
And the full overview of my talk in this article:
- Vinay Perneti, VP of Engineering, Augment Code
Talk: We Thought AI Transformation Was About Adopting Agents. We Were Wrong.
- Andrew Churchill, CTO, Weave
Talk: What Actually Works: AI Coding Patterns from the Top 1% of Teams
- Anwar Haneef, GM & Head of Ecosystem, Canva
Talk: Your Product’s Next User Might Be an AI Agent. What Engineering Leaders Need to Know.
Today, I am sharing the overview and the recording of Andrew Churchill’s talk at the event.
Recording of the talk at the event
You can watch/listen to the talk below, or you can keep reading for the full insights.
Let’s start!
The hype of “10 AI agents doing 400x productivity” is not real
There is so much hype on social media about how to be more productive using AI, but the problem is that those are some specific examples, which do not reflect the actual data on what they at Weave are seeing.
At Weave, they are building a platform for understanding how software engineers work. You can also read a full deep dive on how they work in this article:
In today’s article, I am sharing (based on Andrew’s talk) the results regarding what makes the top 1% teams successful, based on the data from Weave’s usage from hundreds of companies and 10k+ engineers.
It’s also important to mention that the 1% teams in this case are being measured based on the Weave’s code output metric, which is an ML-based algorithm built by Weave to check how effective the work that’s being delivered is.
Not tracking purely lines of code or how many commits were delivered, but understanding the output based on the question: “How long would this PR take an expert engineer to complete?”
The result is a standard unit to quantify output, which is comparable across individuals, teams, languages, and organizations.
There's a big difference between the top 1% teams and the rest
When you look at the chart, there’s an exponential difference between the top 1% of engineers and the rest. Especially interesting is how big a difference it is between the top 10% (P90) and top 1% (P99), and then also between P99 and P50, with a really huge difference.
That’s aligned with the:
Parreto’s 80:20 rule, as we can see, the top 20% of people carry the majority of the productivity.
The power law of distribution (a tiny fraction of individuals generates the vast majority of the impact, success) makes sense in this case, as the top 1% of engineers may produce 10–100× the impact of the median.
In the next sections, we will go through the 5 specific areas of what top 1% teams, based on the Weave’s code output metric, do:
Their engineering organization structure
How much are they spending on AI tools (AI tokens) in relation to how much value such code contributes to success
How do they review their code
They have higher output, does it mean they also have more bugs?
The number of deployments they do
For each area, I am sharing (based on the talk) what the top 1% do that works for them, with a specific example. Let’s start with the first one, the structure of the organization.
1. The organizational structure of the top 1% teams
Many of the top teams are going more toward flatter orgs, have smaller teams, and more higher agency ICs. Higher agency meaning that engineers are owning projects end-to-end and are being more product & business-minded as an engineer e.g., product engineers.
In the talk, Andrew also mentioned that they see that the product engineer role is not a common title anymore, as the expectation is merging just to the title “software engineer”. This is something I have also mentioned in the article AI-Native Engineering Leadership.
Examples of the 2 teams:
Telnyx (extreme example)
Their engineering org consists of 200 engineers, 0 engineering managers, and 1 VP of Engineering. They took the word “flatter org” to the extreme.
Based on the data from Weave, they are doing very well based on their metric, but it’s really important to mention that such an engineering org structure doesn’t work for everyone.
In my opinion, what helps them to have such a structure is the nature of their product, it’s very technical, so it helps a lot for engineers to be more product-oriented as well.
A similar example is what I have recently written about. The company called PortKey has 24 engineers and 0 product managers. Read the full deep dive on how they work in this article:
PostHog
Teams of 1-3 engineers, with an assigned Team Lead for every team (who is a DRI - directly responsible individual). The Team Lead also reports directly to the VP.
They want to avoid the coordination task, which grows quite a lot more when you have 4 or more people inside a team. That’s why they want to keep their teams to a maximum of 3 people.
PostHog is betting on the AI-native team structure similar to what OpenAI is doing. You can read how OpenAI is building AI-native engineering teams in this article:
It’s also important to mention that both of these companies have explosive headcount growth and are hiring a lot of people. They are not “replacing” people with AI, but they are betting on the premise of:
More people mean more productivity exponentially.
And they are not going in the direction of the trend that somewhat bigger companies go for: similar productivity, with fewer people.
I personally think that in the long run, the companies that bet on more people with more productivity will have a lot more advantages in comparison to companies that just want to stay similarly productive as before.
2. Top 1% teams have a good ratio of spending and delivering value
The top teams, which have the highest output, spend more on AI tools, e.g., AI tokens, than the ones that have a lower output, which is not surprising. More AI tool spending should increase the overall amount of work that you deliver.
This is also what we see from the chart as well:
But the interesting part is that the top teams are effectively still keeping the costs lower for the delivery of their work. Which can be seen in this chart:
Good examples of such companies are:
Robinhood
They created a custom agent, fine-tuned to their codebase & requirements, which helps them to keep the costs a lot smaller (token efficiency), while increasing productivity, as it provides a lot more accurate responses based on their specific prompts.
Spott
Every engineer is running 5 (median) agents at the same time. They are spending a lot on AI tools, but based on the data from Weave, they also have significantly more output per dollar than other teams. So, they spend a lot → while they get exponentially more output.
3. Top 1% teams are focusing on AI code reviews and reviewing specs
As we can see from these charts, the top 1% teams have a lot fewer human PR reviews (in percentage) than other teams. But this doesn’t mean that their code gets merged unchecked.
Instead, they are utilizing AI code reviews and optimizing such reviews to give them as much accuracy as possible in order to be confident with the change.
What they see at Weave (and they do the same as well) is that, instead of focusing a lot on reviewing code, after the PR is opened, the focus should be on reviewing specs before the code is even generated. And they believe a lot of the top teams are doing that.
At Weave, they also use 4-5 different AI code reviewers and have a policy that the human code reviews are optional, which helps them move a lot faster.
A good example Andrew has mentioned:
It’s much easier to review 300 lines of markdown (spec) than 3000 lines of code.
4. Top 1% teams have higher output, but the bug rate stays similar
This is an interesting insight, because you would think that with more code and PRs being finished, the rate of bugs would also linearly increase. But based on the data with more output, there is not a lot of increase in the number of bugs being produced. The chart of the data here:
A lot of the teams utilize AI to test their product on a somewhat recurring basis. An interesting example that Andrew mentioned is the company called &AI. They have 5 laptops running 24/7 with Codex 5.5 and doing the QA testing process → testing their app. This helps them to spot the bugs in production before users spot them.
It’s important to mention this is just one example of how you can use AI to help you with finding issues, there are a lot more different ways you can include AI to help you with ensuring your software works correctly, you just need to find what may work for your specific case.
5. Top teams are deploying more to production than others
This hasn’t really changed with AI, as many teams have optimized the way they work based on DORA metrics, and CI/CD has become quite popular in the last 5-10 years.
As we can see from the data, the top teams are deploying more than others:
What’s important to mention here is that the number of deployments can positively affect to a certain extent:
If you deploy 10 times per day (10 PRs being merged), it’s not going to be so much more effective than if you were to just deploy 1 time per day (10 PRs).
So, if you optimize for pure production deployment, you won’t get the increase in the delivery of the additions to the product.
And as we can see from this data, even for larger companies, the amount of deployments that top teams do is higher than the rest of the teams:
Advice on how to improve
So, the question at the end of this article is: How to take this advice in your case so that your teams improve?
Andrew mentioned in the talk that looking to LinkedIn for the latest hype usually isn’t the answer!
Instead, the recommendation is to focus on understanding where you are today and measuring your progress over time. What this means is that you need some data on where you are today, so you can focus on improving it. And then have a consistent way of checking how you are doing in that area over time.
Keep experimenting, identify what’s working, and do more of it, and also stop doing what’s not working.
Last words
Special thanks to Andrew for sharing his insights in his talk at the Engineering Leadership LIVE event in San Francisco. More overviews of talks will be shared in future editions of the newsletter. Stay tuned!
Liked this article? Make sure to 💙 click the like button.
Feedback or addition? Make sure to 💬 comment.
Know someone that would find this helpful? Make sure to 🔁 share this post.
Whenever you are ready, here is how I can help you further
Join the Cohort course Senior Engineer to Lead: Grow and thrive in the role here.
Interested in sponsoring this newsletter? Check the sponsorship options here.
Take a look at the cool swag in the Engineering Leadership Store here.
Want to work with me? You can see all the options here.
Get in touch
You can find me on LinkedIn, X, YouTube, Bluesky, Instagram or Threads.
If you wish to make a request on particular topic you would like to read, you can send me an email to info@gregorojstersek.com.
This newsletter is funded by paid subscriptions from readers like yourself.
If you aren’t already, consider becoming a paid subscriber to receive the full experience!
You are more than welcome to find whatever interests you here and try it out in your particular case. Let me know how it went! Topics are normally about all things engineering related, leadership, management, developing scalable products, building teams etc.















