Exploration phase: Grassroots learning
Our journey began with exploration. Developers were encouraged to use AI tools in their daily work at their own pace. This grassroots effort provided invaluable feedback and revealed where AI could add immediate value without disrupting workflows or compromising standards.
At first, things were chaotic. We tested multiple tools, quickly realizing that the “arms race” between platforms would continue. Our takeaway: pick a tool, learn its nuances and avoid the trap of constant switching.
Today, our core toolset includes GitHub Copilot, Jules and Claude Code, with OpenAI keys powering additional integrations.
We continue to explore and evaluate but we are also focused on maximizing these core toolsets and our ongoing investment into AI.
Execution phase: From small steps to bold moves
Phase 1: Big things have small beginnings
We started with simple, repetitive tasks:
- Inline comments and documentation auto-suggestions
- Generic or boilerplate object creation like Enums and POJOs (plain old java objects)
- Straightforward method logic that just do what the method is called
- Leveraging inline chat features to get quick feedback about certain blocks of code
Phase 2: Smarter code reviews
Next, we moved to AI-assisted code reviews. By enabling automated reviews across all pull requests, we unlocked several benefits:
- Change summarization of large pull requests
- Real-time bug prevention (e.g., catching unclosed HTML tags)
- Secondary code checks prompting closer human review
- Code performance optimization, catching issues that could have slowed execution significantly
Based on the insights we gathered, our first formal initiative was to enable automated code reviews across all pull requests against platform code. This presented a clear opportunity to leverage AI with manageable risk. To start, we used GitHub Copilot, which uses OpenAI’s models.
This approach has significantly enhanced our development process by streamlining various aspects of code review and bug detection. Here are some key ways it has contributed.
Over the long term, catching and correcting mistakes like this can lead to reduced operational and staffing costs as site reliability engineers will have to intervene less for non-critical issues.
Phase 3: Unit Tests and Documentation
We then applied AI to low-risk but high-value areas: unit tests and documentation. The verification overhead here is lower, making it ideal for AI integration.
Using Copilot and Jules, developers collaborated with AI to:
- Suggest edge cases they hadn’t considered
- Format code to align with internal guidelines
- Iterate like working with a junior developer — needing guidance but improving output over time
Phase 4: AI agents as developer assistants
Finally, we began experimenting with AI as a day-to-day development assistant for production code.
AI agents excel at:
- Automating tedious work, such as building POJOs from JSON specs
- Acting as “hyper-intelligent rubber ducks” for architectural discussions
But with great power comes risk. Agents can accelerate complex tasks, but they also make mistakes. For example, one agent attempted a misguided optimization that slowed execution by more than 10x. That’s why human review remains non-negotiable.
Iteration phase: Continuous refinement
Our phased approach ensures continuous learning:
- Develop a deep understanding: We gain practical insight into where AI truly excels and where human expertise remains indispensable. We are not just adopting tools but building a profound understanding of their capabilities and limitations in our environment.
- Tailor solutions to our needs: We are not simply adopting off-the-shelf AI. Instead, we are actively experimenting and adapting tools to align with our specific development workflows, coding standards and project requirements.
- Ensure strategic value: Every AI initiative we undertake is evaluated for its tangible impact. By carefully tracking AI’s influence on productivity and code quality, we ensure that every effort delivers measurable value and contributes directly to our organizational goals.
At first, engineers were skeptical. Today, adoption is high, and productivity is up.
Where we see risks
- Lazily accepting code: AI output is reviewed as critically as human code.
- Production issues: While no AI-produced code has caused failures, we remain vigilant about avoiding increases in Mean Time to Resolution (MTR) in QA, staging or production.
Measuring the ROI of AI
Our AI investment is delivering measurable returns.
Productivity gains
Adoption of AI tools has boosted productivity, with weekly task completion improving by about 20% on average. This is up from less than 10% early in our rollout. The evolution of AI agents capable of handling more complex tasks has been a key driver.
We’ve also seen faster code generation, which requires more detailed reviews. To maximize returns, we continually measure uplift and foster rapid learning. AI is positioned as a productive co-author, supported through an #ai Slack channel and monthly sessions that engage both engineering and non-engineering teams in safe, systematic adoption.
Adoption curve
The adoption curve remains strong, with significant productivity gains expected across engineering by year-end. In January 2025, our thesis centered on applying AI to code reviews, document reviews, production code and UI tests.
This was the chart we published to internal leadership in February and we are pacing at better than our originally predicted pace (e.g. we have turned on code reviews in 100% of our repos and document reviews are running well ahead of schedule).
Investment
Our initial exploration into AI was restrictive and the investment open-ended, which made for a poor understanding of yield and outcomes.
To address, we have since set up a budget and governance structure to equip our teams to request and use company-approved AI tools.
To help illustrate how the budgeting process panned out, the below shows a simplified version of the model used with our finance teams to measure the impact of our AI investments.