The importance of data when making decisions in the engineering industry
A real-world example of how data can help us make the right decision!
DevStats (sponsored)
I’ve checked DevStats thoroughly and have been also using it as well.
Here is my personal recommendation:
“DevStats is one of these tools that you think you don’t need, but it’s just so easy to use and gives so many insightful details that you might be missing.
Before using it, my thoughts were that we were having really great PR Cycle times as well as Code Review sizes, but with helpful filters and visuals, I could clearly see where we can improve.”
Check out DevStats and try it out yourself (no credit card required).
Let’s get back to this week’s thought.
Intro
Whenever we are making a certain decision or trying to get the buy-in in regard to something we wish to accomplish, data is and will always be our ally. It’s objective and if presented well, it’s going to be universally understood.
You want to remove the subjectivity part as much as possible when making decisions.
I am happy to bring in
, Software Engineer, a former competitive programmer and an author of as a guest of this newsletter article.He’ll share his personal story with us, about how data helped him to decide whether to automate or to keep doing a repetitive task manually.
Let’s get straight into it!
Data is and always should be a main driver for decision-making in our industry
I’ll be sharing my real-life scenario that shows that data is (and always should be) the main driver of decision-making in our industry. This example is taken from my experience working as a software developer, so I hope you can relate and share some of your experiences with me.
I have always been an advocate for knowing your data. I like to see numbers and charts.
That’s why, regardless of my specific role in any of the jobs I’ve had, I always find myself saving some spare time to think about KPIs or monitoring systems that allow me (and my team, managers, and so on) to understand the criticalities of our products.
This has always paid good dividends. Guess what? People love numbers and charts. It’s not only me.
It is always reassuring to see how your data reflects the reality of your product. Data persists forever.
We human beings can only remember so much, and having data is, in some way, like having access to an external second brain from which you can derive insights.
The problem
I work in a DevOps team, which means I get to see a lot of other developer’s code and workflows.
Day after day, for the past few months, I stumbled upon the same annoyance every time I logged in to my work computer. I would go and check on our status page and realize that there were jobs that had been running for days.
Now, I don’t know what the average time for training large Machine Learning models is, but I am sure that the teams I support have nothing to do with that. So what were they doing? Why do they have something running for days? And more importantly, why has nobody else noticed?
This is not a normal behavior by any means. The sole purpose of DevOps is to streamline the software development process, which means (in my book) that if something is taking days to test or deploy, something is going on.
Without thinking about it too much, I just went and reported that to my teammates and proceeded to dequeue this job. This process is entirely manual.
How does the manual process look like
I have to log in to a website, look for hanging jobs, and click a button to dequeue them.
In DevOps, everything revolves around automation. Sure, you will always have to do things manually, but you are striving to reduce those processes as much as possible.
What I have been doing for a while now is in direct contradiction to everything that streamlined software development is trying to achieve.
There’s a joke that says DevOps engineers are automating themselves out of their jobs, and things like this example seem like something made on purpose so that my colleagues and I still have something to do.
How much time and resources are we wasting?
I started to wonder how much this issue scaled in the whole organization. How many others had to manually dequeue hanging jobs after going into a website, logging in, and fighting with a not-so-pretty UI to find what they were looking for?
At first, my guess was that the number of hanging jobs that trigger this manual process was not so high.
Then I realized that if I was facing this problem and I only dealt with a fraction of the developers of an organization with tens of thousands of developers, then the issue could be more significant than I thought initially.
So, how do we solve it? Is it even worth solving?
Here’s when data enters the picture.
Our system has a quite useful API that we can use to get data on the build sets we run. Here are the most relevant data that I can use:
Success — all builds in the set were successful.
Failure — some builds in the set were unsuccessful.
Config Error — there is some misconfiguration in the definition of the build.
Dequeued — it was dequeued before it finished.
This is what I discovered:
The number of dequeued build sets amounted to 8% in the last 100k build sets. That means that 8% of the time, someone needed to manually dequeue the job.
And this is how it looks like:
So if we do a quick calculation:
8000 jobs (8% of 100k) needed to be manually dequeued. And it takes anywhere between 10 seconds to 60 seconds of engineer’s manual work per dequeue.
The median is 35 seconds, which sums up to 280k seconds wasted (77~ hours).
The finding:
We are wasting 77~ hours of developers’ time to manually deal with these jobs. And not to mention the waste of time because of context-switching.
How much time and resources for the proposed solution?
The solution I proposed was to create a cron job that runs once a day, gets all the jobs that have been running for X amount of time and automatically dequeues them.
This is a relatively simple solution that can be implemented by one dev in one 8-hour work day.
To automate or to keep doing things manually, that is the question
We can quickly see that it’s a no-brainer to move forward with the implementation.
We are saving 77+ hours of manual tedious work with the 8-hour work by creating an automation.
Plus not to mention the benefits of doing it once and leaving it running, so people can focus on the important tasks, that is to provide value to the business and delight our users by building great products!
I hope this example has helped you understand why data is so important when making decisions.
Remember, this is not only important in software engineering but also in other aspects of work and life!
Last words
Thanks to Alberto for sharing his personal story with us regarding importance of data in making decisions. You can find him on LinkedIn and also make sure to check out his newsletter.
We are not over yet!
Work with me (announcement)
Over the course of 11+ years, I have been doing full-time roles where I grew from engineer all the way to CTO. A big reason why this was possible was due to freelance work at the same time.
I am happy to share that my priorities have shifted and I am putting my focus to growing and continuosly improving this newsletter + freelance work:
- Building and growing this newsletter (we will have 2x articles / week, starting from next week)
- Fractional / Interim CTO
- Tech Advising
- Coaching & Mentoring
- Continuously improving the Course Senior Engineer to Lead: Grow and thrive in the role
- Public speaking / Paid workshops
You can find all the options to work with me below:
Earn Your 1st SWE Promotion
My friends Andy Greenwell and Callie Buruchara have launched the program for Software Engineers who wish to get promoted! By signing up, you get access to the community, 1:1 sessions and the course + many more.
If you wish to get promoted as a Software Engineer, this might be interesting for you!
Liked this article? Make sure to 💙 click the like button.
Feedback or addition? Make sure to 💬 comment.
Know someone that would find this helpful? Make sure to 🔁 share this post.
Whenever you are ready, here is how I can help you further
Join the Cohort course Senior Engineer to Lead: Grow and thrive in the role here.
Interested in sponsoring this newsletter? Check the sponsorship options here.
Take a look at the cool swag in the Engineering Leadership Store here.
Want to work with me? You can see all the options here.
Get in touch
You can find me on LinkedIn or Twitter.
If you wish to make a request on particular topic you would like to read, you can send me an email to info@gregorojstersek.com.
This newsletter is funded by paid subscriptions from readers like yourself.
If you aren’t already, consider becoming a paid subscriber to receive the full experience!
You are more than welcome to find whatever interests you here and try it out in your particular case. Let me know how it went! Topics are normally about all things engineering related, leadership, management, developing scalable products, building teams etc.
Good devops work. But I can't help wondering, why were all those jobs running endlessly in the first place, and is there a more systemic issue that the developers need to fix? What are the repurcussions of those jobs never being completed?
Thanks so much for the kind shoutout, Gregor! 🙂