Writing PRDs for AI Features: A Template That Engineeri…

Template for AI PRDs

AI features need special PRD structure:

Feature: [Name]
Problem: [What's this solving?]
Users: [Who benefits?]

{ Data & Model Section }
Training Data:
  - Source: [Where training data comes from]
  - Volume: [How much data?]
  - Freshness: [How old?]
  - Bias risks: [What could go wrong?]

Model Architecture:
  - Approach: [LLM? Fine-tuned? Custom?]
  - Why this: [Alternatives considered + why rejected]

Success Metrics:
  - Accuracy threshold: [%]
  - Latency: [ms budget]
  - Cost: [$ per request]

Unacceptable Failure Modes:
  - [Failure mode #1]: Would cause [harm]
  - [Failure mode #2]: Would cause [harm]

{ UX & Integration }
User Experience:
  - When AI is confident: [show X]
  - When uncertain: [show Y]
  - Edge cases: [list A, B, C]

{{ Monitoring & Rollback }
Metrics Dashboard:
  - Accuracy trend
  - Confidence distribution
  - User feedback ratio

Rollback Trigger:
  - Accuracy < [%]
  - Latency > [ms]
  - User thumbs-down > [%]

Key Point

AI PRDs need explicit specification of failure modes, success thresholds, and rollback triggers. This prevents shipping AI features that surprise users negatively.

Key Takeaways

Use a template designed for AI features. Traditional feature specifications don't capture ML-specific concerns.
Make unacceptable failure modes explicit. This forces consideration of risks upfront.
Separate ML infrastructure (model, training) from UX (what users see). Different teams, different specs.

Why Traditional PRDs Fail for AI Features

Traditional PRD for Search:

"Build a search feature"
Users enter query, we return results
Engineering: Clear, doable

AI PRD using same format:

"Build AI-powered product recommendations"
System generates recommendations
Engineering: Questions flood in
- "What model?"
- "What training data?"
- "How accurate needs to be?"
- "What if the model hallucinates?"
- "How do we know if it breaks?"

Result: Engineering stalls, waiting for clarification. You're writing specs in real-time.

The Complete AI PRD Template (With Real Example)

Part 1: Problem & Stakeholders

FEATURE: AI-Generated Product Descriptions

PROBLEM STATEMENT:
- Current state: Products have 1-5 manual descriptions. Sparse for long-tail products.
- Target state: Every product has rich, accurate descriptions generated from product data.
- Customer feedback: "Product descriptions are inconsistent and incomplete"
- Business goal: Increase SEO rankings for long-tail products (currently ranking for <20% of catalog)

STAKEHOLDERS & CONCERNS:
- Engineering: Model complexity, latency, cost
- Design: UX for "AI-generated?" label (transparency)
- Legal: Copyright/liability if AI generates inaccurate descriptions
- Finance: Cost per product description generated

Part 2: Data & Model Architecture

DATA SECTION:

Training Data:
  - Source: Wikipedia product descriptions + Amazon product data + our product database
  - Volume: 500K product descriptions (with product attributes as input)
  - Freshness: Retrain monthly
  - Bias risks: Dataset skews toward popular product categories (>90% electronics). Risky for niche products.
  - Deduplication: Remove duplicate/near-duplicate descriptions

Model Architecture:
  - Approach: Fine-tuned GPT-3 (not custom-trained from scratch)
  - Why this: Fast to deploy, proven on product descriptions, cost-effective
  - Alternatives considered: FLAN-T5 (slower), Davinci-003 (more expensive), custom fine-tune (6-month timeline)
  - Cold-start strategy: If fine-tuned model fails, fall back to template-based descriptions

SUCCESS METRICS:
  - Accuracy: Human reviewers rate descriptions 4+/5 (scale 1-5)
  - Coverage: 90%+ of products get descriptions (vs. 60% today)
  - Latency: <500ms per description generation
  - Cost: <$0.01 per description (budget: $5K/month for 500K products)

UNACCEPTABLE FAILURE MODES:
  - Mode 1: Generate descriptions that are factually incorrect (e.g., "CPU speed 1000GHz" for laptop)
    - Impact: Damages brand trust, customer returns
    - Prevention: Validation layer checks for impossible values
    - Rollback trigger: >5% of descriptions fail validation

  - Mode 2: Generate descriptions with copyright violations (copying competitor text)
    - Impact: Legal risk
    - Prevention: Training data deduplicated, model fine-tuned only on licensed data
    - Rollback trigger: Any copyright notice from competitors

  - Mode 3: Generate harmful/offensive content
    - Impact: Brand damage
    - Prevention: Content filtering + human review of first 100
    - Rollback trigger: Any offensive content detected

Part 3: UX & User Experience

UX SECTION:

When AI Confidence is High (>85%):
  - Show description without disclaimer
  - Enable one-click publish to product page
  - Example: [Product Image] "High-performance gaming laptop with RTX 4090..."

When AI Confidence is Medium (70-85%):
  - Show description + "AI-generated" badge
  - Require human review before publishing
  - Allow edit before publishing

When AI Confidence is Low (<70%):
  - Don't show auto-generated description
  - Fall back to template: "[Brand] [Product Type] - [Key Specs]"
  - Example: "Dell Gaming Laptop - RTX 4090, Intel i9, 32GB RAM"

Edge Cases:
  - New product categories: Use template until model trains on new data
  - Product variants: Reuse core description, customize specs
  - Multilingual: For non-English regions, use translation + description generation

Part 4: Metrics & Monitoring

MONITORING SECTION:

Metrics Dashboard (Post-Launch):
  - Accuracy trend: % of descriptions rated 4+/5 by humans (target: >80%)
  - Generation latency: p99 time per description (target: <500ms)
  - Cost trend: $ per description (target: <$0.01)
  - User feedback: Thumbs-up / thumbs-down ratio (target: >80% thumbs-up)
  - Content filtering: % of descriptions flagged by safety filters (target: <1%)

Monitoring Cadence:
  - Daily: Latency, error rate, cost
  - Weekly: Accuracy sampling (manual review of 100 random descriptions)
  - Monthly: Full accuracy audit + user feedback analysis

ROLLBACK TRIGGERS (Automatic or Manual):
  - Accuracy drops below 70% for 2 consecutive days → Rollback
  - Latency exceeds 1 second for >5% of requests → Rollback
  - Cost exceeds $10K/month → Investigate (may not rollback, but escalate)
  - Any copyright claim received → Investigate immediately
  - Safety filter flags >5% of descriptions → Investigate

MONITORING ALERTING:
  - Slack alert if accuracy < 75% (investigate)
  - Slack alert if latency p99 > 1 second (investigate)
  - Email to Legal if copyright terms mentioned in generated descriptions

Part 5: Iteration Plan

PHASE 1 (Week 1-2): Internal Testing
- Generate descriptions for 1K products
- Have team manually review
- Accuracy should be >75%
- If not, retrain model with more data

PHASE 2 (Week 3): Canary (1% of products)
- 1% of products get AI descriptions
- Monitor: Accuracy, latency, cost, user feedback
- Gate: If accuracy >80%, proceed to Phase 2

PHASE 3 (Week 4): Rollout (25% of products)
- 25% of products get AI descriptions
- Require human review before publishing (medium confidence descriptions)
- Monitor: Same metrics

PHASE 4 (Week 5+): Full Rollout (100% of new products)
- All new products get AI descriptions
- Existing products: Backfill over 4 weeks
- Ongoing monitoring

Real-World Example: AI PRD Done Right vs. Done Wrong

Wrong (No template):

"Build AI descriptions for products"
Engineering: confusion
3 weeks of clarification meetings
Shipped without rollback plan
Model failed in production, customer-facing errors

Right (Using template):

Specifications complete and detailed (see above)
Engineering: knows exactly what to build
2 weeks of development
Shipped with monitoring, confidence thresholds, rollback plan
Model works as intended

Anti-Pattern: "Treating AI Like Deterministic Features"

The Problem:

Write PRD: "Generate product descriptions"
Ship feature: Model generates hallucinations
Blame ML team: "Why didn't you test?"

The Fix:

Understand AI PRDs are different (probabilistic, not deterministic)
Build in uncertainty handling (confidence thresholds, fallback behaviors)
Specify failure modes and rollback triggers upfront

Prodinja Connection

AI PRDs are complex. Prodinja provides an AI PRD template that ensures you don't miss critical sections: training data, failure modes, monitoring, rollback. By using the template, you avoid the "realize we forgot this in production" scenario.

Key Takeaways (Expanded)

AI PRDs have 5 parts: Problem + Data/Model + UX + Monitoring + Iteration. Don't skip any.
Specify unacceptable failure modes. What would be bad? How do you prevent it? What triggers rollback?
Confidence thresholds guide UX. High confidence: Show without disclaimer. Low confidence: Use fallback.
Monitoring is not optional. Define metrics, cadence, and rollback triggers before launch.
Iteration phases reduce risk. Test with 1%, then 25%, then 100%. Gate each phase on success criteria.