Octostar

Review & QA at the Times
of AI-Assisted Development

How we gave our automated reviewer a memory
and turned PR comments into permanent infrastructure

Engineering Team | April 2026

1 / 15
Octostar

Three Tiers of Review Today

🤖 The Generic Bot
Runs on every PR, fully automatic
Generic advice: “add error handling”
Starts from zero every run
Zero memory of what went wrong
TIME
MEMORY
+
🧑‍💻 The New Manual
Asks LLM the right questions
Remembers the war stories
Still manual — every PR
Has no time
TIME
MEMORY
🏆 The Bot With Memory
Automatic + remembers your war stories + scoped to each module
TIME
MEMORY
CONTEXT
We can't create time for people.
We can create memory for the bot.
2 / 15
Octostar

Three Memory Layers

🤖
Layer 1 — Bot memory

.cursor/BUGBOT.md

Read at review time by Bugbot
⚠️ Rule enforced on every diff:
"Never use Promise.all on unbounded arrays. Use a concurrency limiter (max 3)."
Hierarchical — bot traverses upward, collecting rules at each directory level
💻
Layer 2 — Developer memory

LESSONS.md

Read at write time by IDE assistant
📚 Universal lesson:
"In-memory state is a cache, not a source of truth. Always query the DB for conflict detection."
The "why" behind patterns — things the bot can't enforce but devs should know
📄
Layer 3 — File memory

Inline Comments

Visible in diff when file is changed
📌 Pinned to one file:
// WARNING: Runtime-evaluated JS. // Cannot use imports. Do not DRY. // (See PR #760)
Most targeted — zero token waste, bot sees it only when that file is in the PR
3 / 15
Octostar

Hierarchical Scoping

Deeper modules get more context. Simple changes get only relevant rules.

.cursor/BUGBOT.md Every PR
apps/octostar/.cursor/BUGBOT.md Octostar PRs
apps/octostar/.../linkchart-nt/.cursor/BUGBOT.md LinkChart PRs
apps/search-app/.cursor/BUGBOT.md Search PRs
packages/.cursor/BUGBOT.md Package PRs

Token Budget per Scenario

LinkChart change
~1,684
Octostar change
~1,250
Package change
~859
Search App change
~728

Target: <400 words per file, <2,000 tokens worst case. Fewer rules = more attention per rule.

4 / 15
Octostar

The Automated Feedback Loop

1

PR Merged

Developer's code lands on main

2

Action Fires

harvest-lessons.yml extracts human review comments

3

Summary Posted

Structured harvest proposal on the merged PR

4

Classify & Commit

Developer runs /pr-war-stories harvest

5a

Reviewer learns

BUGBOT.md rules on next review

5b

Coder learns

LESSONS.md read before writing

✓ Next PR catches more  — loop continues
5 / 15
Octostar

Rule Classification System

Reviewable

Bot can check it

→ BUGBOT.md

"Don't use Promise.all on unbounded arrays"

Educational

Informs devs

→ LESSONS.md

"Build order requires packages before apps"

Single-File

One function

→ inline comment

"This adapter uses reference equality intentionally"

Overlapping

Duplicate rule

→ merge

Three stale-state rules merged into one

Stale

Pattern fixed

→ remove

API migrated, old workaround no longer needed

⚠️

The #1 failure mode: dumping everything into BUGBOT.md. This classification prevents context bloat and ensures each rule is placed where it's most effective.

6 / 15
Octostar

Real War Stories from Our PRs

💥
PR #781
Promise.all on 200 files = OOM
BUGBOT.md
- await Promise.all(files.map(upload))
+ await asyncPool(3, files, upload)
🎨
PR #775
LLM CSS change broke schema editor
BUGBOT.md
- .custom-antlayout { height: 100% }
+ .schema-editor-layout { height: 100% }
PR #748
=== is correct. Don't "fix" it.
Inline
- if (deepEqual(prev, next))
+ if (prev === next) // intentional ref check
PR #741
useState is 1 frame late
LESSONS.md
- const [val, setVal] = useState(x)
+ const valRef = useRef(x) // sync capture
7 / 15
Octostar

What We Shipped

22
War stories extracted
from 50 merged PRs
5
BUGBOT.md files
across the monorepo
6
Inline comments placed
in source files
1
GitHub Action for
automated harvesting
🎯

Already paying for itself

On its first review, Bugbot caught a real bug in the harvest workflow itself — the scope detection used else if instead of if, causing linkchart files to miss parent scope rules. The system's own rules made the system better.

/pr-war-stories setup  —  available as a Claude Code skill for any repository
8 / 15
Octostar

Two Kinds of Knowledge

📐

Architectural Knowledge

What was decided and why
Written at decision time
Forward-looking
Predictable consequences
Captured in ADRs, design docs
Audience: future architects
Example ADR
"We chose an epoch-based concurrency guard in expandQueue to prevent race conditions during graph mutations."
vs
👻

Shadow Knowledge

What went wrong and what we learned
Emerges after the fact
Backward-looking
Unknowable at decision time
Lives in PR comments, people's heads
Audience: future developers
Example war story
"If you bypass the epoch check or call markDone() directly, pending counts go negative and the UI shows stale loading forever."
ADRs document the skeleton. Shadow knowledge is the muscle memory.
It only exists in the heads of people who've been burned. And it walks out the door when they leave.
9 / 15
Octostar

The Dual Flywheel

💻

Coding gets smarter

IDE assistants read LESSONS.md before writing code. They stop suggesting patterns your team already learned are dangerous.

+
🤖

Reviewing gets smarter

Bugbot reads BUGBOT.md rules scoped to each directory. It catches what no human has time to check on every PR.

=
🚀

Compounding intelligence

Every mistake becomes a permanent rule. Every review comment becomes organizational memory. Knowledge survives turnover.

The crisis we're solving
AI tools have 10x'd code production but review throughput hasn't scaled.
Senior engineers are the bottleneck — drowning in PRs, rubber-stamping what they should scrutinize.
We're producing code faster than we can safely review it.
What must change
1Write comments for the bot, not a colleague. Comments become rules.
2A detailed rejection beats a quick approval.
3Junior mistakes are raw material for new rules, not failures.
10 / 15
Octostar
OPEN SOURCE

We open-sourced it.

Everything we built is now a reusable Claude Code skill
that works on any repository with PR history.

sscarduzio/pr-war-stories
github.com/sscarduzio/pr-war-stories
Install claude install-skill sscarduzio/pr-war-stories
Bootstrap /pr-war-stories setup
Harvest /pr-war-stories harvest
Audit /pr-war-stories audit
Mines 50+ merged PRs for war stories
Creates hierarchical BUGBOT.md rules
Installs automated harvest GitHub Action
Token-budgeted, classification-driven

Works with Cursor Bugbot, any GitHub repo, any language.

11 / 15
Octostar

The Skill Commands

Five commands, each for a different moment in the lifecycle

/pr-war-stories setup Once per repo
Mines 50+ merged PRs, creates hierarchical BUGBOT.md files, LESSONS.md, installs the harvest GitHub Action, wires CLAUDE.md. Full bootstrap.
harvest-lessons.yml Auto — every PR merge
GitHub Action fires automatically. Extracts substantive human review comments, posts a structured harvest summary on the merged PR. No human trigger needed.
/pr-war-stories harvest When harvest comments appear
Reads harvest summaries, classifies each lesson (reviewable / educational / single-file), places it in the right layer. The human-in-the-loop step.
/pr-war-stories recheck After big refactors
Greps every rule for path and function references, verifies they still exist in the codebase. Flags stale scopeRules prefixes. Reports but does not auto-fix.
/pr-war-stories audit Quarterly
Checks Bugbot hit rate on last 20 PRs. Removes rules that never triggered. Merges duplicates. Graduates rules that can now be linted. Prevents rot.
Automated Human-triggered
12 / 15
Octostar

Questions you're probably thinking

Anticipated from senior engineers, architects, and tech leads

Won't it rot?
Quarterly audit removes stale rules. Harvest Action keeps fresh ones coming in.
Why not just lint it?
If it can be linted, it should — then remove it from BUGBOT.md. BUGBOT is for contextual knowledge only.
Works with CodeRabbit / Copilot?
Cursor Bugbot reads .cursor/BUGBOT.md natively. Others read LESSONS.md and inline comments.
What about big refactors?
Run /pr-war-stories recheck — flags stale paths, does not auto-fix.
13 / 15
Octostar
Post Scriptum — Future Research

LLM Vision-Based Automated Testing

Describe tests in plain English. An AI model looks at the screen and does it.

Automation SDK

Midscene.js

Plugs into Playwright. No selectors, no XPath, no data-testid.

Vision Model

UI-TARS

Open-source model that sees screenshots and executes UI workflows.

Both run on our hardware. No cloud. No licensing. No data leaves the network.
14 / 15
Octostar
Post Scriptum — Future Research

Promising, But Needs Validation

Demo-grade works. Production-grade is the question.

What looks good

  • Runs on our RTX 6000 PRO — no new infra
  • Handles drag & drop and complex UIs
  • Tests survive UI refactors — no selectors to break

What we need to test

  • Speed — full regression in CI time?
  • Determinism — same result every run?
  • Our UIs — link charts, record viewers
15 / 15