May 12, 2026

Lessons from Building a First-Pass AI PRD Reviewer at Uber

Lakshmi Ashok

Product Lead

Share this article

Introduction

Most product organizations have some version of a review process. Typically, once PMs have an early draft of a PRD (Product Requirement Document) ready, it’s circulated across design, engineering, legal, operations, science, and product leadership. That process is designed to improve quality and reduce risk. In practice, it often reveals a harder reality: PMs might be making decisions in systems where the relevant context extends far beyond what any one person can easily assemble on their own.

A PRD could reach the review stage with an unsupported headroom assumption, a blind spot in how the feature could affect adjacent systems, an unexamined second-order effect, or a policy-sensitive change without the guardrails reviewers expect. In other cases, the team may be unknowingly revisiting a hypothesis that was already explored in a smaller experiment or adjacent effort, but the relevant context is scattered across docs, decks, dashboards, and institutional memory.

At that point, the review process tends to pivot to lower-level discovery work: surfacing adjacent impacts, reconstructing prior context, and identifying questions that’d been more useful to address earlier. That slows teams down, consumes reviewer attention on issues that could have been surfaced earlier, and makes feedback inconsistent.

The real problem isn’t that PMs lack rigor. It’s that product work often requires a 360-degree view that’s difficult to assemble manually in the moment: adjacent impacts, partner concerns, prior experiments, hidden dependencies, and the questions senior reviewers are likely to ask.

That was the problem we set out to solve.

Why This Matters at Uber

At Uber, product development runs through a structured checkpoint process that gives leadership and cross-functional teams visibility, accelerates approvals, and drives consistent execution. But a checkpoint process is only as effective as the quality of the materials entering it.

We saw an opportunity to strengthen that workflow further by helping PMs surface important questions earlier. Rather than changing the checkpoint process itself, the goal was to improve the quality of what entered it.

That led us to a simple question, and ultimately to the PRD Evaluator: what if every PM had a fast, contextual first-pass reviewer before a PRD reached the broader approval process?

Role of the AI-Powered PRD Evaluator

The PRD Evaluator is an AI-powered reviewer that starts with a PRD and assembles a broader knowledge base around it: linked documents, related decks and meeting notes, prior experiments, cross-functional artifacts, and preloaded Uber-specific context like core principles, metric definitions, and key jobs to be done. It uses that context to return a structured assessment of launch readiness.

Its role is deliberately focused: strengthen the PRD before it reaches high-cost review forums. Not to replace senior judgment, but to help teams enter those conversations with stronger context and fewer avoidable gaps. It sits upstream of the approval system and improves the quality of what enters it.

For us, that meant building a system that helps PMs do a few things earlier and better:

Identify the most important gaps in a draft
Surface adjacent impacts and cross-functional dependencies
Uncover prior learnings that may not be obvious to the current team
Enter checkpoint and review forums with a stronger artifact

How It Works: 4 Steps From Draft to Actionable Scorecard

We didn’t want a generic writing tool that simply rewarded polished prose. A PRD can be well-written and still miss the context, framing, or decision logic that determines whether it’ll hold up in review.

Step-by-step process for evaluating a PRD: share a PRD link, gather context from related documents, evaluate across dimensions, and receive a scorecard with ratings and action items. Each step is represented by an icon and brief description on a dark background.

Figure 1: Overview of how the PRD Evaluator works.

1. Build a Broader Knowledge Base Around the PRD

The evaluator uses the PRD as an entry point, then harnesses AI to search across relevant company artifacts and linked material to assemble the context needed to assess the decision well: related documents, prior experiments, cross-functional inputs, and preloaded Uber-specific context.

2. Classify the PRD to Calibrate Review Depth

Not every PRD needs the same scrutiny. The evaluator classifies each proposal and calibrates accordingly:

Lighter review for UX parity or discoverability changes
Moderate review for incremental workflow changes or internal tooling migrations
Full review for net-new capabilities
Full review with specialized scrutiny for policy, pricing, or marketplace changes

3. Assess Launch Readiness Across Multiple Dimensions

The review is structured around several dimensions including:

Opportunity and Hypothesis: Is the problem real, and is success defined clearly enough to evaluate?
Product Scope: Is the proposal understandable, well-scoped, and decision-ready?
User Experience and Impact: Does the experience work well across user segments, geos and potential edge cases?
Metric and Data Rigor: Does the PRD define success, guardrails, and a credible validation approach?

4. Produce a Scorecard Built for Action

Rather than a wall of comments, the evaluator produces a structured scorecard:

A launch-readiness rating
Dimension-by-dimension assessments
A clear “start here” pointer to the most important fix
For each gap, share what is missing, provide write-ready replacement text suggestions, and evidence from linked docs or prior experiments
Prioritized action items split into critical requirements and optimizations

The output is designed to do more than point out weaknesses. It is meant to make the next round of revision easier and more targeted, and the next review conversation higher signal.

Four-section summary of deliverables: Launch Readiness Rating (with statuses Ready, Ready with Caveats, Not Ready), six Dimension Scores (rated Looks Good or Needs Review), Detailed Findings & Fixes (including replacement text and evidence), and Action Items (Critical Requirements and Optimizations).

Figure 2: Summary of the PRD Reviewer output format.

Figure 3: Illustrative scorecard example.

Where the Value Shows up for PMs

The biggest value is that it changes the quality and timing of product thinking.

It Expands a PM’s Field of View

Many of the hardest product mistakes come from incomplete visibility. A PM may not know that a similar hypothesis was tested earlier by another team. They may not realize a metric is ambiguous or missing an obvious guardrail. They may not see a downstream operational dependency because it sits outside their immediate product surface.

A truly useful evaluator expands that field of view. It can connect a draft to prior artifacts, adjacent efforts, pre-existing hypotheses, and missing questions, to which the author has access, that’d otherwise depend on someone else remembering them in a meeting. It can also surface context that was never explicitly linked in the PRD but is still relevant to understanding the decision.

It Makes Self-Review More Structured

Most PMs can tell when a document feels weak. The harder question is why it’s weak and what to fix first.

The evaluator makes that diagnosis more explicit. Instead of vague unease, the PM gets a structured view of missing fundamentals: unsupported headroom assumptions, undefined guardrails, blind spots in how a change could affect adjacent systems, or risks that need acknowledgement.

It Improves the Quality of Review Rooms

When a PRD reaches a reviewer in better shape, the discussion moves faster toward tradeoffs, prioritization, and judgment, and less time is spent recovering context. That is where the evaluator connects most directly to Uber’s product development system.

It Turns Critique Into Usable Revision

The most important design choice in the system wasn’t scoring. It was ensuring actionability.

PMs don’t benefit much from comments like “be more specific” or “think through downside risk”. The evaluator is most useful when it converts critique into revision guidance: define the baseline, name the target, add the guardrail, scope the first release more narrowly, acknowledge the risk, or make the dependency explicit.

That changes the workflow from passive critique to active improvement.

Early Adoption

Early usage validated the core value: the evaluator helped IC PMs discover blind spots early, pressure-test unsupported headroom assumptions, surface how a proposed change could affect adjacent systems that weren’t core to their role, and identify experience improvements within the scope they had already defined.

In early internal usage, the evaluator has already been used by dozens of PMs across Uber.

The tool’s value shows up when PMs can bring it into their normal drafting and review workflow, strengthen the fidelity of what enters review, and help reviewers focus on higher-signal questions.

What We Learned

A few lessons stood out as we built and tested the evaluator:

Frameworks beat generic critique. Broad comments rarely help teams move faster. The leverage comes from a framework tied to actual decision criteria and failure modes.
Context matters as much as language quality. Many important signals live outside the PRD itself, and richer context often reveals a different set of blind spots than the document alone.
Hard boundaries make output more honest. Defining a small set of critical gaps helped the evaluator avoid calling a PRD review-ready when the fundamentals were missing.
Prioritization is part of the product. A review tool that flags everything as important isn’t helping. The evaluator’s value comes from telling PMs what to fix first.
The best AI output improves human conversations. The strongest sign the evaluator was working was that later review discussions became sharper and faster.

Where Human Judgment Still Matters

The evaluator doesn’t aim to make final manual approval decisions or replace domain experts. The tool is most useful when it strengthens the artifact before expert review.

Conclusion

The hardest part of product development is getting the right people to make the right decisions at the right time, using an artifact strong enough to support those decisions.

Most product organizations have some equivalent of checkpoints, review forums, or gated approvals. The names differ, but the challenge is the same: how do you make sure the artifact entering the process is strong enough for the process to do real work?

AI has real leverage here as a structured thought partner that expands context, surfaces blind spots, and sharpens judgment before a decision reaches a high-cost forum. That is why we built the PRD Evaluator. And based on what we’ve seen so far, we think this pattern (AI that strengthens the input to human decision-making) will matter well beyond one company or one tool.

Acknowledgments

Cover Photo Attribution: Created by Gemini

Scorecard Images Attribution: Created by Claude

Written by

Lakshmi Ashok

Product Lead

Lakshmi Ashok is a Product Lead at Uber based in San Francisco. Lakshmi drives innovation across the Driver and Courier App, from AI-first experiences to in-vehicle integrations, with a focus on making the app seamless and reliable. Outside work, she enjoys mahjong, dancing, hiking, and reading.

933 articles

Filter by:

All categories