From Notion Export to Local Knowledge Base in One Afternoon
I hit the Export button in Notion and got a zip file. Three megabytes. Years of notes, content calendars, CRM records, meeting summaries, and project tracking—all flattened into Markdown and CSV files with garbled names.
Turning that export into a structured local knowledge base for an AI agent wasn’t hard. But it was full of the kind of small surprises that make you appreciate why Notion charges a subscription.
The Export
Notion’s export is a zip file. Or rather, it’s a zip inside a zip. The outer archive contains an inner archive, and the inner archive contains your data. This is apparently normal. It’s also apparently undocumented.
The files inside follow Notion’s internal naming convention: every file has a 32-character hex ID appended to its name. Content Calendar 1a2b3c4d5e6f.csv instead of just Content Calendar.csv. The directory structure mirrors your Notion workspace, but the folder names have IDs too.
For 47 files, this was manageable. I organized them into a workspace structure that made sense for both me and Tars:
Every file is Markdown or CSV. Every file lives on my SSD. Every file is readable by both ClawPad (the editor) and Tars (the agent). No database. No API. Just files.
The CSV Gotcha
The content calendar was the most important import. Thirty articles in various stages—ideas, drafts, ready to publish. Notion exports these as CSV with all the metadata: title, status, publish date, and tags.
My first attempt to parse it produced rows titled “Untitled” for every entry. The data was there, but the title column wasn’t matching.
The culprit: BOM (Byte Order Mark) characters. Notion’s CSV export prepends \ufeff to the beginning of the file. This invisible character attaches itself to the first column header, so Title becomes \ufeffTitle. Your code reads the header, doesn’t find a match for Title, and returns empty strings.
# Wrong
with open(’calendar.csv’) as f:
reader = csv.DictReader(f)
# Right
with open(’calendar.csv’, encoding=’utf-8-sig’) as f:
reader = csv.DictReader(f)The utf-8-sig encoding strips the BOM automatically. This is a Python-specific fix—other languages have their own BOM handling. But the universal lesson is always inspect the actual bytes of imported data before writing parsing code. A head -c 20 file.csv | xxd would have shown me the BOM in seconds.
Building the Content Pipeline
The raw CSV became pages/astgl/pipeline.md—a living document Tars checks during every heartbeat cycle:






