PM Brain OS: The Second Brain for Product Managers, Made of Markdown
A folder of files on your laptop. Claude reads them before answering, writes to them after, sweeps them every Friday. Open source. 17 synthetic PM scenarios, 404 of 406 checks pass (≈99.5%).
Research preview. The architecture has months of dogfooding behind it on my content work. The product as installed by real PMs in real organizations is days old. The eval suite is the floor; your install feedback is how it gets better. Monday May 25, I'm running a live workshop session for paid subscribers, and there's a dedicated #pm-brain Slack channel for 1:1 install help. Details in §11.
You manage one product. Your context lives in five places: Notion, Linear, Slack, your dashboards, and your head.
You ship a feature. Six weeks later, nobody remembers why you killed the other option. The customer interview that should have informed the call is buried in a doc you forgot to link. The stakeholder who pushed back has a new concern, and the old one is gone.
I built a second brain for it. A folder of markdown files in a git repo on your laptop, plus a CLAUDE.md operating manual that tells Claude how to use them. The agent loads the right files before a task, updates the right files after, sweeps them weekly, and tags every claim with where it came from.
PM Brain is new. The architecture isn’t. I’ve been running the same pattern (operating manual, tags, hypotheses, decisions, weekly maintenance sweep) on my content work for months. PM Brain is that architecture adapted to PM-specific schemas, validated against 17 scenarios I designed from PM situations I’ve shipped through, with synthetic data generated across persona, stage, and risk dimensions.
404 of 406 individual checks pass (≈99.5%) on Claude Sonnet 4.6. MIT licensed. One shell command installs the skill globally; /pm-brain bootstraps a brain in whatever folder you choose. The repo lives at github.com/phuryn/pm-brain.
This post is the long version. The short version is in the README. The shortest version: open Claude Code in an empty folder, run /pm-brain, answer a five-batch interview, and you have a working brain in 10 minutes.
What You’ll Learn
Why your PM context decays at month three, and where the trail goes
The five failure modes that kill most AI memory systems, and the five structural choices that answer them
A week with PM Brain, told as Lena’s first five days on a real account
The architecture: five knowledge areas, three lifecycle areas, one CLAUDE.md
Provenance, the unique technical idea, and why it carries more weight than people expect
The six commands you’ll actually use
How to install it today, and what to do in week one
1. What PM Brain Actually Is
A folder of markdown files in a git repo. Plus a CLAUDE.md operating manual that tells Claude how to use them. The ready-to-use skill drops it in when you bootstrap.
That’s the whole product.
The agent does four things:
Loads relevant context before any task. No blind drafting. Before you prep for a 1:1, draft a hypothesis, or review a feature, it reads the area files into context.
Updates the right files after the task. New evidence lands where it belongs.
Surfaces contradictions you’d miss. Recent decisions vs documented strategy. Verbal claims vs documented research. Hypotheses with no evidence in 30+ days.
Runs a weekly maintenance sweep. Six checks, dated report. Edits where confidence is high, drafts where not.
Every load-bearing claim wears a provenance tag: a small marker saying whether it came from a documented interview, a verbal stakeholder comment, your hunch, or general industry knowledge. The full vocabulary is in §4; the short version is that the tag travels with the claim everywhere it’s used.
What it isn’t. No vector database. No embeddings. No auto-tagging. No cloud. Everything is grep-able. You can open the entire brain in any editor.
It’s also not an agent memory system. Those embed everything you feed them so the agent can recall it later. PM Brain does the opposite. You write down only what matters, in markdown you can read, and the agent reads what you wrote. The point isn’t agent recall. It’s PM judgment with an audit trail.
A PM Brain doesn’t store everything. It promotes only what’s recurring, decision-relevant, or strategy-relevant. Everything else stays in working memory until it earns its way into the durable layer.
Why most AI memory systems fail by month three
Five failure modes show up consistently across vector-DB memory, RAG over your docs, “AI second brain” apps, and most agent-memory frameworks. PM Brain is structured around five corresponding design choices, each one targeting a specific failure:
Full mapping with the architecture choices behind each fix: docs/why-this-matters.md.
2. A Week With PM Brain: Lena’s First Five Days
The fastest way to see what PM Brain does is to watch someone use it. This is a short story. The character is invented, the artifacts are real-shaped, the team and the tool are not. The full version lives in docs/walkthrough.md.
Lena Vasquez just took over PM for a B2B project-collaboration tool called Mosaic. The previous PM left a Notion workspace, a Jira project, and a 3-month-old Miro discovery board.
🚀 Monday: onboarding
Lena plugs in the Notion, Jira, and Miro app connectors, then runs /pm-brain. The skill enters migration mode, runs a five-batch interview (about 10 minutes), then reads through the three tools.
Forty-five minutes later: a folder of markdown files, an INDEX.md, and a short onboarding report. Buried in it:
Tension surfaced. The strategy doc says the north-star is activation: 60% of new accounts invite a teammate by day 3. But of 47 Jira tickets shipped in Q1 and Q2, 38 are tagged “enterprise permissions” or “admin tooling.” Only 4 touch the activation funnel.
A new PM usually catches this in week three, after they’ve already built plans on top of the official story. Lena has it on day one because the brain cross-checked two folders humans tend to read separately.
🎧 Tuesday: a discovery call
Lena has a 45-minute call with Talia Brennan, Head of Operations at Northridge Construction. Quiet user, seat count plateaued, no complaints. After the call: /ingest northridge-talia-transcript.md.
Four things happen in parallel:
The transcript copies to source/interviews/.... Read-only. Audit anchor.
A synthesis note lands in ingestion/interviews/... with observations tagged by speaker and date.
The brain tells Lena that two stakeholders flagged the same invite-link friction back in March, that there’s a candidate hypothesis the previous PM never tested, and that the Miro discovery board has a “magic-link onboarding?” sticky from January, never explored.
Talia’s observation lands as a new evidence row on the invite-link hypothesis. It promotes from candidate to proposed. Three independent observations now.
One call. The weight of three.
📣 Wednesday: the loud stakeholder
Slack lights up. Rahul Pillai (marketing): “We need to ship dark mode this quarter. Every demo asks for it. It’s killing conversion.”
Loud, recent, plausible. A less-confident PM might tell engineering to start scoping.
Lena types into Claude: “Rahul just messaged about dark mode. Capture as verbal stakeholder claim, flag any conflict with documented evidence.”
The brain tags the claim (stakeholder-verbal, rahul-pillai, 2026-05-20). Then it surfaces what’s documented:
Q1 close-loss review: 27 demos analyzed. Dark mode flagged in 2 as a wishlist item. Zero blockers.
Cascadia Health interview, 2026-03-15: buyer named integrations and admin controls. Not dark mode.
Brillstone interview, 2026-04-18: same.
Q2 strategy doc: activation north-star, not feature parity.
The brain doesn’t argue with Rahul. It puts his verbal claim next to 27 sales calls, two customer interviews, and the team’s stated strategy, each wearing its actual provenance.
Lena replies: “Want to dig into this. Friday.” She has the documented case ready and a productive conversation booked.
👨🏭 Thursday: engineering proposes an alternative
1:1 with Mateus Okafor (engineering lead): “What if we ran a 1-week spike on magic-invite-links? If it works, we save four weeks of UI.”
Lena adds a line. The brain records the verbal, files it as a solution option on the invite-link hypothesis, findable next to Talia’s evidence and the Miro sticky from January. The suggestion doesn’t die in Slack scrollback.
🧘♀️ Friday: /review, then Monday’s prep is done
Lena runs /review. One page:
This week
Strategy gap (still open): flagged Monday, not raised yet.
Invite-link hypothesis: candidate → proposed. Three observations.
Dark mode tension (Rahul): verbal claim captured, flagged against documented evidence. Friday meeting is your action.
Magic-invite-links spike (Mateus): solution option logged.
Drifting
enterprise-permissions-v2: no new evidence in 47 days. Revive, demote, or archive?
For your Monday strategy meeting
Open with the strategy-vs-shipped-work gap. Activation-first or enterprise-first? One is wrong.
Bring the invite-link hypothesis forward. Three observations + one engineering alternative. Ask for a go/no-go.
Park dark mode until Friday’s meeting with Rahul.
Lena reads this in five minutes. Monday’s prep is done.
What Lena got, in five lines:
Monday: a real question to bring into your first strategy meeting, on day one
Tuesday: one call that landed with the weight of three
Wednesday: documented evidence ready when a loud voice contradicted it
Thursday: an engineering suggestion that didn’t die in Slack
Friday: a one-page summary that made Monday’s meeting easy
None of this is automation. It’s the brain doing the small, boring work of cross-referencing what she already knew, so the judgment work, which is her job, gets easier.
3. The Architecture
Five knowledge areas, three lifecycle areas, four ingestion modes, one maintenance loop:
Plus source/ (immutable copies of original artifacts), a maintenance log, and a docs folder.
The cognition pipeline
Evidence flows in one direction. It fans out at the durable layer:
The same artifact updates multiple destinations in parallel. Talia’s 45-minute interview touched six files: one source copy, one ingestion record, one insight promoted to knowledge/users/, one hypothesis strengthened, one stakeholder touchpoint logged, and a candidate solution option logged on the hypothesis.
The hypothesis / decision split
When a hypothesis is confirmed, it gets promoted and a decision record is auto-drafted (status: pending, waiting for your sign-off). When a decision’s reversal condition triggers, the weekly sweep surfaces it:
Most systems mash these together. They become useless.
4. Provenance: Every Claim Wears a Tag
Every load-bearing claim in hypotheses/, decisions/, and knowledge/users/insights.md carries a small tag, a provenance marker, that says where it came from:
The tags carry an implicit hierarchy: documented decisions outweigh documented research, which outweighs verbal claims, which outweighs PM intuition. The brain leans on that hierarchy when evidence conflicts. The leaning is in plain text. You can override it.
The brain enforces the vocabulary, not the workflow. PMs have intuitions. They hear things off-the-record from execs. They inherit claims with no clear pedigree. Those are legitimate inputs. The tag just makes them wear their actual provenance instead of laundering them through a fake ingestion/ record.
That’s why “we need dark mode” from one verbal claim lands very differently than the same claim across 27 documented sales calls, even when both technically sit in the brain. Wednesday’s vignette is the whole reason this layer exists.
Three months later your CTO asks: “Why did we kill real-time alerts?” You open the decision file. Every evidence row carries a tag. Path-typed tags walk in two clicks to the synthesis, then to the raw transcript. Non-path tags tell you honestly that no artifact exists. Both are auditable. Only a missing tag is a bug.
5. Will This Stay Healthy After a Year?
Yes. The layer the agent loads by default (your durable knowledge of strategy, product, users, market) grows logarithmically by design. The raw layer grows linearly with your activity but stays cold unless something cites it.
Default loads target the durable + active layers. Brain age doesn’t change what routine commands read.
The architectural payoff: plain markdown in a git repo is forward-compatible with every future Claude, every future Cursor, every future agent. The format doesn’t bind to today’s tooling. Context windows expand; your brain stays the same shape.
Longer version with the realistic envelope numbers (~50-100 interviews/year, 10-20 active hypotheses, 10-30 stakeholders) and the four on-demand triggers that pull old material into context: docs/scaling.md.
6. The Six Commands
/ingest is the workhorse. The other five operate on the durable layer.
/plan is where the system earns its keep. A new objective lands, “reduce onboarding drop-off by 20% in Q3”, and the brain loads strategy, current metrics, user insights, active hypotheses, past decisions, stakeholder constraints, then drafts the six blocks. That output is the difference between “another second brain” and a system that does product work with you.
Conversational equivalents work for all six. The slash commands are optional.
7. The Maintenance Sweep: Where Most Systems Die
Memory systems rot in predictable ways. The weekly sweep is the forcing function that catches those failure modes before they compound. Skip /review for a month and the brain becomes a graveyard. Run it weekly and the system pays you back every Friday.
Eight failure modes; six checks. Dated report in maintenance/log/:
Stale knowledge audit. Files not updated in 6+ weeks. Still true? Archive?
Stale evidence flagging. Market intel past 30–60 days, interviews past 90, stakeholder assumptions past 30, strategy assumptions past quarterly. Flags; doesn’t auto-decay confidence. You decide what to refresh.
Hypothesis and decision hygiene. Active hypotheses with no evidence in 30+ days. Promoted hypotheses without a corresponding decision (drafts one). Decisions whose reversal condition triggered. Pending decisions older than 14 days with blocker impact (decision debt).
Stakeholder cadence and strategy tensions. High-influence stakeholders not touched in 3+ weeks. Recent decisions or signals diverging from strategy. Surfaced as tensions, not as drift to fix.
Knowledge synthesis (compression). The highest-leverage step. Identifies recurring patterns AND recurring contradictions. Preserves minority signals: the dissenting interview, the contrarian metric, the off-pattern stakeholder concern. Compression is additive, never destructive.
Archival sweep. Shipped features inactive 90+ days. Resolved hypotheses. Closed asks. Before archiving anything, extracts durable lessons.
Twenty minutes Friday afternoon. Set a recurring calendar reminder. Without it, the brain rots.
8. What PM Brain Is Not Good At
The honest limits, since the rest of this is sales copy if I skip them:
A few notes the grid doesn’t cover:
The ad-hoc inbox tempts laziness. “I’ll route this later.” No. Every ad-hoc item gets resolved in the same session. The folder is a sorting bench. The moment it becomes a backlog, it becomes a graveyard.
Memory promotion requires judgment. The agent proposes what to promote. The call is yours. Rubber-stamp everything and the durable layer fills with noise. Reject everything and the system never learns.
Stakeholder files feel awkward at first. Writing down what your manager cares about, in a file, in your git repo, can feel like treating people as objects. The reframe that worked for me: this isn’t about them, it’s about your continuity. You forget. The file remembers.
9. Does It Actually Work? The Test Scoreboard
The eval suite runs 17 synthetic PM scenarios. Churn investigation. Stakeholder cadence flag. Contradiction arriving 60 days after a decision. Strategy drift.
404 of 406 individual checks pass (≈99.5%) on Sonnet 4.6.
The split:
Structural checks: 329 / 329 (100%). Files exist where they should. Links resolve. Evidence rows carry valid provenance tags. Decision schemas are valid. Hypothesis statuses match the evidence claims.
LLM-judge content checks: 75 / 77 (≈97%). Rubrics evaluating whether the brain surfaced the right contradictions, drafted the right decisions, asked the right questions. Two judges missed on the two longest scenarios. Documented openly, not hidden.
Each scenario is multi-turn. The harness spins up a fresh brain in a temp directory, replays the inputs through claude -p, runs structural assertions after every turn, and runs LLM-judge rubrics on substance at the end. Full breakdown and snapshots in tests/RESULTS.md.
The point isn’t 99.5%. The point is that PM Brain has an eval suite at all. Most “AI memory” projects don’t. When the skill changes, the suite tells you whether it got better or worse, on real-shaped PM situations, with snapshots you can diff.
10. How to Start
Two stages: install the skill once, then bootstrap a brain in any folder.
Stage 1: Install the skill (one-time, global)
The skill lands in ~/.claude/skills/pm-brain/ and is available across every Claude Code session afterward, in any working directory. You install it once.
macOS / Linux / WSL / Git Bash:
mkdir -p ~/.claude/skills && \
curl -L https://github.com/phuryn/pm-brain/archive/refs/heads/main.tar.gz | \
tar xz --strip-components=3 -C ~/.claude/skills pm-brain-main/.claude/skills/pm-brain/Windows PowerShell:
$dest = "$env:USERPROFILE\.claude\skills"
New-Item -ItemType Directory -Force -Path $dest | Out-Null
irm "https://github.com/phuryn/pm-brain/archive/refs/heads/main.zip" -OutFile "$env:TEMP\pmb.zip"
Expand-Archive "$env:TEMP\pmb.zip" "$env:TEMP\pmb" -Force
Copy-Item "$env:TEMP\pmb\pm-brain-main\.claude\skills\pm-brain" $dest -Recurse -Force
Remove-Item "$env:TEMP\pmb.zip","$env:TEMP\pmb" -Recurse -ForceUniversal fallback (any OS with git):
git clone https://github.com/phuryn/pm-brain.git
cp -R pm-brain/.claude/skills/pm-brain ~/.claude/skills/
# On Windows PowerShell, replace the cp line with:
# Copy-Item pm-brain\.claude\skills\pm-brain $env:USERPROFILE\.claude\skills\ -RecurseStage 2: Bootstrap a brain (per-product)
In any folder you want the brain to live in, call /pm-brain in Claude Code (VS Code extension or CLI). Terminal version:
cd ~/projects/my-product-brain
claude
# in the Claude Code prompt:
/pm-brainThe skill auto-detects what’s in the directory. Empty folder = greenfield. Folder with existing PM artifacts (Notion exports, meeting notes, Jira CSV) = migration. Either way, a short 5-batch interview captures your context. The scaffold drops in. The CLAUDE.md operating manual lands at the brain root. The brain commits locally. Never pushes.
Migration handles your current state. Don’t backfill old artifacts retroactively. Two different things:
Migration mode, recommended: When you point /pm-brain at a folder with your active strategy doc, your in-flight hypotheses, your recent decisions, and your current stakeholder list, the skill reads them, organizes them, and produces a short report on what it found. That’s the goal. Let migration absorb your current organized PM artifacts.
Backfilling everything, the trap: The temptation is to spend a weekend manually feeding 200 old interview transcripts, six months of Slack threads, and every meeting note you’ve ever taken through /ingest. Don’t. If a six-month-old interview matters, it’ll come up through current work and you’ll ingest it then, with the context to know why it matters. Forcing stale artifacts in now wastes a weekend and clogs the durable layer with stuff the agent has no context to promote properly.
The system dies if these three habits don’t take hold in week one. Do them in order, in the first seven days:
Ingest one real artifact today. Paste your most recent customer interview, meeting notes, or competitor screenshot. Watch where it lands in ingestion/, then which durable areas it updates. This is how you learn what the system does.
Prep your next 1:1. Before your next conversation with your highest-friction stakeholder, run /prep . Let the agent surface what to ask. The first real value is here, not in the scaffold.
Run /review Friday. Set a recurring calendar reminder. Twenty minutes Friday afternoon. Without it, the brain rots.
That’s week one. Don’t add anything else.
Week 2. Ingest two more interviews. Log one decision. Add one feature file for your most active feature. Open knowledge/strategy.md and fill in the Non-goals section if the interview didn’t.
Week 3. Second /review. Now you can compare two reports. Patterns start showing up: recurring contradictions, drifting strategy, hypotheses with no evidence. This is when the system starts paying off.
Week 4+. It compounds. Every meeting note, every market signal, every decision feeds the same loop. The brain learns the shape of your product as you work.
The goal isn’t a complete brain. The goal is a brain that compounds over time.
11. A Research Preview, with Help
PM Brain is a research preview. The architecture has months of dogfooding behind it on my content work; the product as installed by real PMs is new. I want to learn what breaks in real organizations, in the first week of use.
Three things to make adoption easier:
Live workshop session, Monday May 25, for paid subscribers. I’ll walk through install, migration, and the first week of usage live, then take questions. See: https://go.productcompass.pm/premium
Dedicated #pm-brain Slack channel for 1:1 install help (paid members). Office Hours members get the channel. I’m in there for install pain, first-week confusion, and the awkward moments where the system feels wrong but you can’t articulate why. Those are the highest-signal reports.
Public issues. github.com/phuryn/pm-brain/issues. Install bugs, feature requests, scenarios you want the eval suite to cover. Anyone can open one without a Substack subscription.
If you’re in a real product role and the system might fit your workflow, your install feedback in week one is what makes the next version better.
12. Closing
Most product organizations repeatedly lose the same things: failed bets, abandoned assumptions, historical reasoning, unresolved tensions. The PM job is partly to remember these. The PM job is impossible when your context lives in five different places and none of them talk to each other.
One repo. One operating manual. An agent that reads it.
The brain isn’t a notes app you fill in. It’s a folder of plain-text files that grow with the work, traceable back to the source on every line, swept clean every Friday, free for anyone to use, and inspectable by anyone you trust.
P.S. The repo is at github.com/phuryn/pm-brain.
Thanks for Reading The Product Compass
It’s amazing to learn and grow together.
Have a great rest of the week, Paweł
Resources
The PM Brain OS
The repo. github.com/phuryn/pm-brain. Skill, example brain, docs, tests, all MIT licensed.
The walkthrough. Lena’s first five days, the long version of Section 2.
Why this matters. Five failure modes that kill most AI memory systems, and the five structural choices that answer them.
The architecture doc. Two design decisions and one operating loop.
Scaling. How the brain stays healthy as it grows, growth shapes, compression mechanisms, realistic envelope numbers.
How it works. The technical version, one ingestion, six files touched, end to end.
Prior art. What PM Brain borrows from Zettelkasten / RAG / CLAUDE.md patterns, and what it rejects.
Glossary. Every term in plain English.
Tests scoreboard. 404 / 406 checks, per-scenario snapshots, two residual failures called out honestly.
Issues. github.com/phuryn/pm-brain/issues. Install bugs, scenarios you want covered, feature requests. Open one.
Composing with PM skills
PM Skills (11K+ stars). github.com/phuryn/pm-skills. PM Brain is the memory layer; PM Skills is the workflow layer. They compose. A JTBD PM Skill extracts jobs from an interview using the proper framework; PM Brain makes sure that job updates the right user insight, hypothesis, stakeholder note, and decision record. The skill is how to do the work once. The brain is what we know across all the times we did it.
Related newsletter posts
Claude Code for PMs: The Beginner’s Guide (May 12). Everything to start.
Claude Code Guide (Mar 8). Full version. The tool this runs on.
Self-improving AI systems (Mar 16). The parent pattern for compounding agents.
Intent Engineering Framework (Jan 13). The agent design language behind CLAUDE.md.
What Is Product Discovery? The Ultimate Guide (updated 2025). Critical for PMs.














