As The Geek Learns

As The Geek Learns

From Notion Export to Local Knowledge Base in One Afternoon

James Cruce's avatar
James Cruce
Mar 30, 2026
∙ Paid
Upgrade to paid to play voiceover

I hit the Export button in Notion and got a zip file. Three megabytes. Years of notes, content calendars, CRM records, meeting summaries, and project tracking—all flattened into Markdown and CSV files with garbled names.

Turning that export into a structured local knowledge base for an AI agent wasn’t hard. But it was full of the kind of small surprises that make you appreciate why Notion charges a subscription.


Knowledge Word Cloud

The Export

Notion’s export is a zip file. Or rather, it’s a zip inside a zip. The outer archive contains an inner archive, and the inner archive contains your data. This is apparently normal. It’s also apparently undocumented.

The files inside follow Notion’s internal naming convention: every file has a 32-character hex ID appended to its name. Content Calendar 1a2b3c4d5e6f.csv instead of just Content Calendar.csv. The directory structure mirrors your Notion workspace, but the folder names have IDs too.

For 47 files, this was manageable. I organized them into a workspace structure that made sense for both me and Tars:

Every file is Markdown or CSV. Every file lives on my SSD. Every file is readable by both ClawPad (the editor) and Tars (the agent). No database. No API. Just files.

As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


The CSV Gotcha

The content calendar was the most important import. Thirty articles in various stages—ideas, drafts, ready to publish. Notion exports these as CSV with all the metadata: title, status, publish date, and tags.

My first attempt to parse it produced rows titled “Untitled” for every entry. The data was there, but the title column wasn’t matching.

The culprit: BOM (Byte Order Mark) characters. Notion’s CSV export prepends \ufeff to the beginning of the file. This invisible character attaches itself to the first column header, so Title becomes \ufeffTitle. Your code reads the header, doesn’t find a match for Title, and returns empty strings.

# Wrong
with open(’calendar.csv’) as f:
 reader = csv.DictReader(f)

# Right
with open(’calendar.csv’, encoding=’utf-8-sig’) as f:
 reader = csv.DictReader(f)

The utf-8-sig encoding strips the BOM automatically. This is a Python-specific fix—other languages have their own BOM handling. But the universal lesson is always inspect the actual bytes of imported data before writing parsing code. A head -c 20 file.csv | xxd would have shown me the BOM in seconds.

Building the Content Pipeline

The raw CSV became pages/astgl/pipeline.md—a living document Tars checks during every heartbeat cycle:

User's avatar

Continue reading this post for free, courtesy of James Cruce.

Or purchase a paid subscription.
© 2026 James Cruce · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture