Software Development Process

Inside the Software Development Process: What Elite Teams Measure That Most Don't

- MAR 2026

Senior QA Engineer & Technical Writer

Paul Rose is an experienced test engineer with a background in the aviation and healthcare industries. In addition to his technical expertise, Paul is a proficient writer with several posts on Medium.com.

Seventy percent of software projects still exceed their budgets, but teams with mature measurement practices achieve 2.2x faster delivery and 60-point jumps in customer satisfaction — the gap isn't talent or tooling, it's knowing what to measure.

Seventy percent of software projects exceed their original budgets. That number has barely moved in a decade, despite better tools, faster hardware, and an explosion of methodologies promising to fix everything. CloudApper's analysis confirms what most engineering leaders already suspect: the problem isn't a lack of frameworks. It's a lack of visibility into what actually matters.

The gap between elite engineering teams and everyone else isn't talent or tooling. It's measurement discipline. Organizations — including top software development companies — with mature metrics programs achieve 2.2x faster delivery times and report 60-percentage-point improvements in customer satisfaction ratings, according to LinearB's analysis of 6.1 million pull requests across 3,000+ development teams in 32 countries (linearb.io).

That isn't marginal improvement. It's a different category of performance, and it starts with understanding what to measure.

Key Takeaways

Elite teams deploy 208x more frequently than low performers, with 106x faster lead times (DORA/Accelerate research)

AI boosts individual throughput (21% more tasks) but correlates with decreased stability at the org level — "AI amplifies what's already there"

Senior developers (10+ yrs) see the highest AI quality gains (68%) but have the lowest confidence shipping AI code unreviewed (26%) — the trust inversion

The strongest predictor of high performance isn't tooling — it's organizational culture (Westrum generative model)

Start with deployment frequency, the simplest DORA metric, and expand from there

Why Process Structure Determines What You Can Measure

The software life cycle (SDLC) formalizes the journey from idea to production into six phases: planning, requirements, design, development, testing, and deployment with ongoing maintenance (splunk.com). Whether a team runs Waterfall (sequential, document-driven), Agile (iterative, sprint-based), or DevOps (continuous delivery with integrated operations), the methodology shapes which metrics are even possible to collect (scrumexpert.com).

A Waterfall team can measure milestone completion but struggles with deployment frequency — understanding the waterfall vs agile methodology trade-offs matters here. An Agile team tracks velocity and sprint burndown but may lack production stability data. A DevOps team can measure all four DORA metrics natively because continuous delivery generates the telemetry automatically.

The methodology itself matters less than whether it fits your constraints — but your choice directly determines your measurement ceiling. Teams that can't deploy continuously can't measure deployment frequency. Teams without sprint cadence can't track velocity. The process you choose is the process you can observe.

DORA Metrics: The Industry Standard for Process Performance

The DevOps Research and Assessment team was founded in 2014 as an independent research group investigating the practices that drive high performance in software delivery. In 2018, three of its members — Nicole Forsgren, Jez Humble, and Gene Kim — published Accelerate: The Science of Lean Software and DevOps, which established the empirical link between organizational culture, operational performance, and business outcomes. Google acquired DORA in 2019, and the team has continued producing annual research that shapes how the industry benchmarks itself (swarmia.com).

DORA identified four key metrics that capture the tension between speed and stability — how frequently teams ship and how often those changes cause incidents (cortex.io):

Metric	What It Measures	Elite Benchmark
Deployment Frequency	How often code reaches production	On-demand, multiple times daily
Lead Time for Changes	Time from commit to production	Less than one hour
Change Failure Rate	Percentage of deployments causing failures	0–15%
Mean Time to Recovery	Time to restore service after failure	Less than one hour

Elite teams deploy on-demand multiple times per day, using deployment frequency as a proxy for how automated and reliable their pipeline actually is.

A low mean time to recovery indicates the team can quickly identify and resolve issues, keeping user disruption minimal. Robust monitoring, alerting, and automated recovery processes all contribute to improving this metric (port.io).

What the Accelerate Research Established — and How DORA Has Evolved

The foundational Accelerate research (2018) quantified the elite-to-laggard gap: elite performers deployed 208x more frequently than low performers, with 106x faster lead times, roughly 4x lower change failure rates, and recovery times measured in hours versus weeks (octopus.com). Companies in the top DORA tier were twice as likely to exceed their profitability goals and achieve 50% higher market growth over three years (opslevel.com).

Those multipliers still frame the conversation, but the 2025 DORA report retired the four-tier Low/Medium/High/Elite classification entirely. In its place: seven team archetypes based on eight measures spanning throughput, stability, and team well-being (splunk.com):

Archetype	Share	Profile
Harmonious High-Achievers	20%	Sustainable excellence across all dimensions
Pragmatic Performers	20%	High speed with functional environments
Stable and Methodical	15%	Deliberate delivery, high quality
Constrained by Process	17%	Consumed by inefficient workflows
Legacy Bottleneck	11%	Constantly reacting to unstable systems
High Impact, Low Cadence	7%	Quality work, but slowly
Foundational Challenges	10%	Survival mode with significant process gaps

The top two archetypes represent 40% of the industry — and their success demonstrates that speed and stability aren't trade-offs but reinforcing outcomes.

Two findings from the foundational research remain durable. First, external change advisory boards don't increase production stability — they inversely impact lead time, deployment frequency, and restore time. Teams that own their own change process outperform those waiting for committee approval. Second, the strongest predictor of high performance is organizational culture. Teams with a generative (Westrum) culture — high cooperation, shared risk, and blame-free postmortems — are 2.9x more likely to be top performers.

The 2025 AI Pivot: What 5,000 Developers Revealed

In 2025, DORA surveyed nearly 5,000 technology professionals worldwide and collected over 100 hours of qualitative data, focusing entirely on AI-assisted software development (dora.dev). The findings upended several assumptions.

Finding	Data Point	Implication
AI adoption is near-universal	90% of devs use AI at work; 71% for writing code — up 14 pts from 2024 (cloud.google.com)	AI is no longer optional; measurement must account for it
The trust paradox	80% say AI boosts productivity, but only 3% report high trust in AI output (faros.ai)	Developers are using tools they don't fully trust — and the quality data suggests they're right
Individual ≠ organizational gains	AI users completed 21% more tasks, merged 98% more PRs — but org-level delivery stayed flat	"AI doesn't fix a team; it amplifies what's already there" — strong teams get stronger, struggling teams see dysfunctions intensified
Throughput up, stability down	AI correlates positively with throughput but negatively with stability — more change failures, longer resolution times	Without automated testing and fast feedback loops, increased change volume creates downstream problems

The 2025 report also retired the old Low/Medium/High/Elite four-tier performance classification, replacing it with seven team archetypes based on eight measures. Only 20% of teams qualified as "Harmonious High Achievers," while 10% faced foundational challenges severe enough to negate any AI benefit.

Beyond DORA: The Metrics Most Teams Miss

DORA covers throughput and stability. It doesn't cover everything that matters.

Developer Experience vs. Developer Productivity

Modern software development metrics need to balance two critical dimensions. Developer Experience (DevEx) captures developer morale and engagement when interacting with tools, processes, and environments. Developer Productivity (DevProd) measures how effectively teams complete meaningful tasks with minimal waste.

Push productivity without watching experience and you'll burn the team out. Prioritize experience while ignoring throughput and nothing ships. The organizations getting this right track both — and pay close attention when the signals diverge.

Code Review: The Silent Bottleneck

Code review turnaround time is often the silent bottleneck in the cycle time equation. Long review times kill momentum and increase merge conflicts. Teams that obsess over deployment frequency while ignoring review latency are measuring the wrong end of the pipeline.

Technical Debt as a Ratio

Technical debt is best tracked through the ratio of rework to new work — it represents the "interest" paid on fast, suboptimal code choices. When this ratio creeps upward, the team is spending more time fixing past decisions than building new capability. A healthy test coverage baseline of 70–80% ensures that refactoring and AI-generated additions don't silently break existing functionality.

The Complete KPI Landscape

KPIs for software development fall into four categories: developer productivity, software performance, defect tracking, and usability/UX metrics. Velocity — the amount of work a development team finishes in a single sprint, typically measured in story points — is the most common productivity metric, though it takes roughly three sprints before you get a reliable baseline.

The primary challenge with these metrics isn't data collection — it's stitching them together. Git repositories tell you what changed but not why. Project management tools know which stories closed but can't see the technical cost of closing them. And CI/CD dashboards will happily show green builds while code quality degrades underneath.

Real-World Measurement: What It Actually Looks Like

None of this works as a slide deck. It works as a habit. And the teams that built that habit have results worth studying.

The Transformations

Etsy pioneered continuous delivery starting in 2009, going from twice-weekly deploys to 50+ per day by 2011 using a custom tool called Deployinator. But the story didn't stop there — by 2024, Etsy migrated its entire infrastructure to Google Cloud and built a new Service Platform on Cloud Run that cut new service deployment time from days to minutes, while continuing to ship at high frequency across its engineering organization (cloud.google.com).

Adidas compressed their deployment cycle from every 4–6 weeks to multiple times per day during a multi-year transformation starting around 2018, growing from €1 billion to €5 billion in e-commerce revenue by 2022. The shift required a move to Kubernetes-based cloud-native architecture and microservices — a custom software development approach that enabled teams to deploy independently rather than coordinating monolithic releases (itrevolution.com).

Organizations adopting Internal Developer Platforms (IDPs) saw individual productivity improve by 8% and team productivity by 10% on average, according to 2024 industry data. One case study documented a 10x improvement in release frequency (monthly to daily), 90% reduction in deployment time, and 75% defect reduction through automated testing (jellyfish.co). IDC research from 2024 found that organizations using CI/CD pipelines saw deployment frequency increase by 48%.

The Scrappy Version

Not every team is Etsy or Adidas. A company of about 20 people introduced OKRs and started tracking metrics across their development team. Their process? Pasting screenshots from different tools into a central location, then discussing them in weekly meetings to derive actions (reddit.com). They tracked team health through weekly surveys, measured time spent on support activities, and pulled analytics from their logs.

It's duct tape and good intentions — but the screenshots-in-a-document approach captures something polished dashboards miss: human judgment about what the numbers actually mean.

"We would look at the burndown charts but after Waydev came, we were able to get a little bit deeper." — Abhijit Khasnis, TATA Health

"We're already seeing that in the past three months productivity has gone up 30%." — Alex Solo, Sovos

The lesson from both scales: you need consistency more than sophistication. Measure the same things the same way, every sprint, and review them with the team.

The Expert Generalist: Why AI Changes What "Senior" Means

Martin Fowler's Thoughtworks team has begun explicitly recognizing the "Expert Generalist" as a first-class skill for recruiting and promoting software professionals (martinfowler.com).

"The characteristics that we've observed separating effective software developers from the chaff aren't things that depend on the specifics of tooling. We rather appreciate such things as: the knowledge of core concepts and patterns of programming, a knack for decomposing complex work-items into small, testable pieces, and the ability to collaborate with both other programmers and those who will benefit from the software." — Martin Fowler, Thoughtworks

The timing isn't coincidental. With 84% of developers using or planning to use AI tools — up from 76% in just one year (Stack Overflow 2025 Developer Survey) — the developer's role is shifting from code producer to AI orchestrator. Expert Generalists become more valuable as AI handles routine specialized tasks. The ability to decompose problems, evaluate AI-generated output, and collaborate across disciplines matters more than syntax fluency in any single language.

Why AI Output Still Needs Senior Judgment

The data on AI-generated code quality explains why. CodeRabbit's analysis of 470 GitHub pull requests found that AI-generated PRs contain 1.7x more issues than human-written ones — roughly 10.8 issues per AI PR versus 6.5 for human PRs. Logic and correctness problems (business logic errors, misconfigurations, unsafe control flow) rise 75% in AI-generated code (coderabbit.ai).

GitClear's 2025 report — analyzing 211 million changed lines of code from January 2020 through December 2024 — found that code churn (lines reverted or updated within two weeks) rose from 3.1% to 5.7%, coinciding with AI assistant adoption. Refactored ("moved") lines collapsed from 25% to under 10%, while copy/pasted code rose from 8.3% to 12.3%, exceeding refactored code for the first time in the dataset's history (gitclear.com).

Perhaps most telling is what Qodo's 2025 survey revealed about the experience-trust inversion (qodo.ai):

why-ai-still-needs-senior-judgement

The developers best equipped to evaluate AI output trust it least. The ones least equipped trust it most. That inversion is the strongest argument for code review as a non-negotiable quality gate in AI-augmented workflows.

Organizations still need specialists — the SME role in software development remains essential. But the Expert Generalist — someone who understands core concepts deeply enough to work across domains and evaluate AI output critically — is the profile that scales best in an AI-augmented team.

Meanwhile, 66% of managers admit that recent hires often show up unprepared, largely because expectations and responsibilities were never fully mapped (Deloitte 2025 Global Human Capital Trends). Hiring for "five years of React experience" misses the point when what you actually need is someone who can decompose problems and guide AI output.

Five Process Failures and the Metrics That Catch Them

Every common process failure has a metric that serves as an early warning — if you're watching.

Failure	The Metric That Catches It	What to Watch For
Security treated as afterthought	Change failure rate	Spikes after deploys that skipped security scans. Teams integrating security into every phase see lower CFR (securitycompass.com).
Unmaintainable code accumulating	Rework-to-new-work ratio	When rework exceeds 20-25% of total output, the codebase is taxing the team more than new features are.
Testing gaps compounding	Defect escape rate + test coverage trend	Declining coverage or rising escaped defects signal that speed is outpacing quality gates.
Documentation rot	Onboarding time-to-first-commit	New engineers taking longer to ship their first PR is a proxy for documentation decay.
Estimation drift	Velocity variance across sprints	The 70% budget overrun rate traces to poor estimation. Three sprints of stable velocity gives teams an empirical basis — high variance means estimates are still guesswork.

The pattern: no single metric tells the full story. But the right combination of signals gives teams time to correct course before small problems become expensive ones.

Building Your Measurement Practice

You don't need to implement everything at once. Effective software development management starts with deployment frequency — it's the simplest DORA metric and directly reflects how automated and reliable your delivery pipeline is. Once you're tracking that consistently, add lead time, then change failure rate, then MTTR.

deployment-rollout-map

The global software market is projected to grow from $823.92 billion in 2025 to $2.25 trillion by 2034, with enterprise software accounting for 61% of total revenue. With 28.7 million software developers worldwide and 81% of companies now considering low-code platforms strategically important, the scale of software delivery will only increase. Teams that can't measure their process won't be able to improve it — and the teams that can will take their market share.

Frequently Asked Questions

What are DORA metrics and why do they matter?

DORA metrics are four measurements developed by Google's DevOps Research and Assessment team: deployment frequency, lead time for changes, change failure rate, and mean time to recovery. They matter because they're the most empirically validated indicators of software delivery performance, backed by a decade of research across thousands of organizations. Teams that score well on DORA metrics consistently deliver faster with fewer failures.

How long does it take to establish useful metrics?

Expect about three sprints (roughly six to nine weeks) before velocity data becomes reliable enough to use for estimation. DORA metrics can be meaningful within a month if you have basic CI/CD tooling. The key is consistency — measure the same things the same way every cycle.

Should we track developer productivity or developer experience?

Both — they're a check on each other. Track DevProd metrics (velocity, throughput, lead time) alongside DevEx metrics (satisfaction surveys, tool friction scores, review latency). When productivity climbs but experience scores drop, you're borrowing against the team's future capacity.

What's the biggest mistake teams make with metrics?

Treating metrics as goals rather than diagnostics. When deployment frequency becomes a target, teams start deploying empty changes. When velocity becomes a target, story points inflate. Use metrics to understand your process, not to gamify it.

Do we need expensive tooling to measure effectively?

No. A 20-person team improved their process by pasting metric screenshots into a shared document and reviewing them weekly. Specialized tools (LinearB, Waydev, Swarmia) add correlation and automation, but the foundation is consistent tracking and honest team discussion about what the numbers mean.

How does AI change what we should measure?

Significantly. The 2025 DORA report found that AI boosts individual throughput (21% more tasks, 98% more PRs merged) but correlates with decreased stability — more change failures, more rework, longer resolution times. Teams should track AI-specific metrics: code churn rate (lines reverted within two weeks), defect density in AI-assisted vs. manual code, code review turnaround time (which jumped 91% post-AI adoption according to Faros AI telemetry), and the ratio of refactored code to duplicated code. The old DORA metrics still matter, but without these additional signals, you'll see throughput gains masking quality erosion.

Sources

Primary Research

DORA 2025 Report — Google's DevOps Research and Assessment, ~5,000 respondents
Google Cloud: 2025 DORA Announcement
CodeRabbit: State of AI vs. Human Code Generation — 470 GitHub PRs analyzed
GitClear: AI Copilot Code Quality 2025 — 211M changed lines, Jan 2020–Dec 2024
Qodo: State of AI Code Quality 2025
LinearB: Software Development Metrics Guide — 6.1M pull requests, 3,000+ teams

Case Studies

Industry Analysis

Frameworks & Methodology

Like what you just read?
— Share with your network

Paul Rose

Senior QA Engineer & Technical Writer

Find me on:

Subscribe

Stay ahead with our newsletter.

Subscribe Now

Latest Blog

What is a Subject Matter Expert in Software Development(SME)? A Complete Guide Learn what a subject matter expert (SME) does in software development. Explore SME types, engagement models, core competencies, and salary data ($97K+).

Mina Stojkovic

Senior Technical Writer

Outsourcing Development Locally: 7 Benefits of Onshore Software Development Explore the strategic benefits of onshore software development—from real-time collaboration and higher quality output to stronger legal protections. Learn how...

Mina Stojkovic

Senior Technical Writer

How To Choose a Software Development Company Selecting a software development company is a multi-dimensional decision that determines whether your project succeeds or fails. With 70% of delivered...

Victor James

Software Engineer & Technical Writer

Software RFP: How to Write a Request for Proposal Why is writing a good software RFP important? All you need to know about how do you create an accurate request for proposal for software development.

What Is Software Localization? A Complete Guide to Adapting Digital Products for Global Markets Software localization is one of the most valuable capabilities software development companies can build. Yet most companies confuse it with simple translation,...

Best Project Management Tools for Software Development This article discusses project management tools for software development teams to streamline their workflow and ensure tasks are completed on schedule. It...