Building a Second Brain as a Researcher — with Claude Code and Obsidian

April 10, 2026 7 min read #obsidian#pkm#llm#claude-code#research#second-brain

Every week, I see 10+ new papers across computer vision, neural rendering, and real-time avatars — on arXiv, LinkedIn, or Twitter. That number easily doubles when I include adjacent areas, especially new audio and LLM papers coming out at an astonishing pace, many of them directly relevant to my work.

By Friday, I’ve read some and skimmed most of them. Some notes and research ideas feel worth keeping. By the following month, I can recall maybe a third of what I filed, and have forgotten even some relevant ones. By the time I actually need one — writing a related work section, designing a new experiment, evaluating a technical decision — I can find very few. The rest are lost in notes or bookmarks with no central index.

Reading is not the problem. Finding things again when I actually need them is. And this is not just about research papers — same goes for tools, GitHub repos, and anything else I’m trying to track.

The Note-Taking Trap

I have used Obsidian and GoodNotes for research notes during my PhD. Sometimes it was just Apple Notes. Each had its strengths, but all ran into the same wall: capture is easy; compiling, indexing, and retrieving is the hard part.

The workflow looks like this. You read a paper. You write a summary note and save the paper somewhere. If you are disciplined, you tag it with relevant topics and link it to related work. If you are very disciplined, you update the concept article it touches, add it to the relevant project pages, and record a comparison with the prior method it supersedes. All of this requires time — but in reality, most of the time you read the paper on arXiv and maybe bookmark it somewhere.

Last week, Andrej Karpathy posted about his system for building an LLM knowledge base — and it exploded. I already had some Obsidian notes, so I thought it might be a good idea to see if I could finally solve the hard part.

Karpathy’s Framing

Andrej Karpathy put this problem cleanly in a recent post — and elaborated the full pattern in a companion gist:

Karpathy tweet on LLM Knowledge Bases

“TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian.”

What clicked for me reading this: the architecture maps almost exactly to what I already had — a raw/ folder of source documents, Obsidian for reading and navigation — but with the missing piece finally named. The problem was never the interface. It was always the compilation step: reading something, deciding where it fits, linking it to three other things, updating the relevant concept pages. That’s the part that takes time and that I was always skipping. Unlike RAG, which rediscovers answers from raw documents on every query, this approach has the LLM build a persistent wiki that keeps getting denser as you add more. That part I could automate.

What I Built so far

Last week I rebuilt my research vault around this idea. The system has two components: Obsidian as the read and navigation layer, and Claude Code as the LLM engine that actively compiles incoming material into structured, cross-linked knowledge.

The core structure:

raw/  ──→  Claude Code  ──→  wiki/  ──→  Obsidian
            compile                      Local Dashboard
            index
            lint
            query

The raw/ directory is the ingest layer — immutable source material, organised by type:

raw/
├── articles/   # papers, blog posts, web articles
├── repos/      # tool and framework captures
├── ideas/      # unprocessed ideas
├── books/      # book summaries
└── captures/   # quick inbox notes

Every file I drop into raw/ gets a compiled: false frontmatter flag. This is the processing queue.

The wiki/ directory is the compiled knowledge store — structured articles, cross-linked using Obsidian wikilink syntax, organised by domain: projects/, concepts/, career/. Three index files rebuild automatically after every compile run:

  • _index.md — master catalog of all wiki articles with domain and status
  • _concepts.md — concept taxonomy mapping cross-cutting topics to dependent projects
  • _backlinks.md — connection density map showing which articles are most referenced
Second Brain dashboard in Obsidian showing Projects, Concepts, Career, and Raw Sources

The Compilation Step

When Claude Code processes a file from raw/, it runs the following sequence:

  1. Read the source and extract semantically relevant content
  2. Identify connections to existing wiki articles via the current index and concept graph
  3. Create or update the relevant wiki article in the correct subdirectory
  4. Add [[wikilinks]] connecting new content to related projects and concepts
  5. Rebuild _index.md, _concepts.md, and _backlinks.md
  6. Set compiled: true on the source file

I created four Claude commands to run these steps more efficiently: /compile, /index, /lint, and /qa for compiling, indexing, linting, and Q&A respectively.

Obsidian graph view showing the compiled knowledge graph after one week

Querying the Graph

The difference from keyword search is significant. When I ask “What have I evaluated for real-time streaming?”, I get back a response that cites specific wiki articles, lists the models with their latency numbers, and tells me which projects each one is relevant to. A normal search would give me a list of files and I’d still have to read through them myself.

“Which projects depend on SMPL-X pose estimation?” works the same way — it already knows the connections because they were written in at compile time.

I also run a lint command periodically to check the health of the vault: broken links, orphaned notes with nothing pointing to them, missing frontmatter, files in raw/ that never got compiled. It catches things before they pile up.

Tips from One Week

Seven days in. A few things I would do differently from the start:

  1. Keep raw/ immutable. Do not edit source files after ingest. If you want to annotate, compile first, then edit the wiki article. Mixing source and editorial content in the same file breaks the audit trail and confuses the compilation step.

  2. Tag concepts in raw/ before compiling. A single concepts: [gaussian-splatting, real-time-rendering] line in the frontmatter gives the engine a hint and produces sharper cross-links than it would infer from body text alone.

  3. Run lint weekly, not daily. Daily lint creates noise before the graph has enough density to make orphan detection meaningful. A weekly cadence works better.

  4. One domain subfolder per project in wiki/. I initially kept all project notes flat. Subfolders — wiki/projects/avatar-gen/, wiki/projects/streaming/ — make the index much more navigable as the vault grows.

  5. The graph compounds. After three days I could already see it. Papers from day one were getting new backlinks from things I added on day six. You can’t do that manually when you’re reading 40 papers a week — you’d have to remember everything you’ve ever read and re-check it every time something new comes in.

Replication

Start from the Karpathy gist, create the folder structure with Claude Code, and add skills and plugins based on your needs.

ComponentPurpose
CLAUDE.mdEngine instruction spec — loaded automatically by Claude Code
Vault folder structureraw/, wiki/, outputs/, templates/, daily/
ObsidianRead and navigation layer
Claude CodeLLM engine — compile, index, lint, query
/compile, /qa, /lint, /index skillsSkill definitions for each engine operation

I have also created a local dashboard for better visualization and to keep track of weekly todos.