Balancing Cost and Quality in AI Summarization
Exploring how model choice, input caching, and scale impact the cost and quality of AI-generated legislative summaries.

Learning in Practice
This post reflects insights gained while building and optimizing my legislation summarization pipeline — a project that automatically generates legislative summaries across thousands of bills. I used it as a test bed for comparing cost-efficiency and quality across Anthropic and OpenAI models.
Background
Scaling summarization across hundreds or even thousands of legislative documents introduces a real-world tradeoff between cost and quality.
Each summary request can involve thousands of input tokens, and when multiplied across large datasets, the financial impact becomes significant.
That’s where input caching and model right-sizing become critical tools for optimization.
The Experiment
I ran the same piece of legislation — House Resolution 211 — through multiple AI models to evaluate performance, tone, and accuracy relative to cost.
The input (prompt) used for these tests was approximately 1,450 tokens, while my production pipeline’s actual cached input per bill averages ~3,500 tokens.
Models Tested
- Anthropic Claude Sonnet 4.5
- Anthropic Claude Haiku 3.5
- OpenAI GPT-5 Nano
- OpenAI GPT-5 Mini
- DeepSeek V3.1 (via OpenRouter)
Cost Efficiency Breakdown
| Category | Rank | Notes |
|---|---|---|
| Claude Sonnet 4.5 | B | $0.01449 → $0.01368 w/ caching |
| Claude Haiku 3.5 | A | $0.00353 → $0.00333 w/ caching |
| GPT-5 Nano | A+ | $0.00106 → &0.00100 w/ caching |
| GPT-5 Mini | A- | $0.00424 → $0.00398 w/ caching |
| DeepSeek V3.1 | Free | Zero-cost test via OpenRouter |
Note on Input Caching
Input caching drastically reduces repeated-token costs — from $0.05 → $0.005 per 1M input tokens. In my case, caching saved about 11% of input cost across 3500+ summarizations.
Quality Observations
While cost per summary is an important metric, quality is harder to quantify.
Below are excerpts from each model’s summary of the same bill (see full comparison file in the repository).
📊Comparative Highlights
Strong procedural comprehension, but tends toward formal repetition. Best suited for long-form policy writing.
Balanced summary quality, though less nuanced on legislative structure.
Consistently accurate and clear, matching higher-tier models in legislative context while costing 10x less.
Excellent structural explanation, slightly higher cost. Good compromise between quality and runtime.
Summarizes cleanly, lacks nuance. Great for broad analysis or zero-cost experimentation.
Standard Summary
This is a procedural resolution that sets the rules for considering three separate pieces of legislation in the House of Representatives. It establishes the debate parameters and voting procedures for: (1) a joint resolution to overturn an IRS rule requiring brokers to report gross proceeds from digital asset (cryptocurrency) sales, (2) a bill to extend the time period for prosecuting unemployment fraud cases related to COVID-19 relief programs, and (3) a continuing resolution to fund the government through the remainder of fiscal year 2025. Additionally, the resolution includes a provision that affects how Congress counts days when considering whether to terminate a national emergency declared by the President on February 1, 2025.
Key Points
- Allows consideration of H.J. Res. 25 to disapprove an IRS rule on cryptocurrency broker reporting requirements, with one hour of debate controlled by the Ways and Means Committee
- Permits consideration of H.R. 1156 to extend the statute of limitations for prosecuting fraud in pandemic-era unemployment programs (CARES Act related)
- Enables consideration of H.R. 1968, a continuing resolution to maintain government funding through September 30, 2025
- Waives all points of order (procedural objections) against consideration of all three measures
- Suspends the calendar day count for Congressional Review Act purposes regarding a February 1, 2025 presidential national emergency declaration
Impact Areas
- Cryptocurrency industry and digital asset traders (regarding tax reporting requirements)
- Federal law enforcement and prosecutors pursuing pandemic unemployment fraud cases
- Federal government operations and funding continuity
- Congressional oversight of presidential emergency powers
- Legislative procedure and House floor management
Lessons Learned
Takeaways
- GPT-5 Nano offers exceptional value: near enterprise-grade summarization quality at a fraction of the cost.
- Input caching is not just a minor optimization — it’s the difference between scaling feasibly and going bankrupt.
- Model selection should be guided by fit-for-purpose, not by raw model size or hype.
- Anthropic’s Haiku and GPT-5 Nano both excel for summarization tasks, though the Nano’s cost-efficiency gives it a decisive edge.
Personal Insight
After processing over 3,500 legislative summaries, my conclusion is clear: GPT-5 Nano delivers the best overall value — balancing quality, speed, and cost with impressive consistency.
Reflection
Balancing cost and performance in AI pipelines is more than an optimization problem — it’s a mindset shift.
The key isn’t to always use the “best” model, but the most appropriate one.
In production, efficiency and predictability often matter more than perfection.
Next Steps
- Add automated daily cost tracking via OpenAI Usage API
- Benchmark output quality with automated scoring
- Experiment with hybrid caching and reranking