<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[As The Geek Learns]]></title><description><![CDATA[Tools and training for IT professionals. PowerCLI courses, productivity apps, and 25 years of lessons learned.]]></description><link>https://astgl.com</link><image><url>https://substackcdn.com/image/fetch/$s_!hfS3!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png</url><title>As The Geek Learns</title><link>https://astgl.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 28 Jun 2026 20:50:04 GMT</lastBuildDate><atom:link href="https://astgl.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[James Cruce]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[astgl@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[astgl@substack.com]]></itunes:email><itunes:name><![CDATA[James Cruce]]></itunes:name></itunes:owner><itunes:author><![CDATA[James Cruce]]></itunes:author><googleplay:owner><![CDATA[astgl@substack.com]]></googleplay:owner><googleplay:email><![CDATA[astgl@substack.com]]></googleplay:email><googleplay:author><![CDATA[James Cruce]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Apple Container vs Docker: When to Use Which on Apple Silicon]]></title><description><![CDATA[Apple's container CLI hit 30K stars promising speed. I benchmarked it against Docker Desktop and OrbStack &#8212; here's where the line actually is.]]></description><link>https://astgl.com/p/apple-container-vs-docker-apple-silicon</link><guid isPermaLink="false">https://astgl.com/p/apple-container-vs-docker-apple-silicon</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Sat, 27 Jun 2026 17:30:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fSjI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fSjI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fSjI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 424w, https://substackcdn.com/image/fetch/$s_!fSjI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 848w, https://substackcdn.com/image/fetch/$s_!fSjI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 1272w, https://substackcdn.com/image/fetch/$s_!fSjI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fSjI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:303718,&quot;alt&quot;:&quot;Three corrugated shipping containers standing on an Apple Silicon chip labeled \&quot;M3 Ultra\&quot; &#8212; a silver container marked \&quot;container,\&quot; a blue one marked \&quot;docker,\&quot; and an orange-to-pink gradient one marked \&quot;orbstack.\&quot; Title reads \&quot;Apple container vs Docker.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/203788714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Three corrugated shipping containers standing on an Apple Silicon chip labeled &quot;M3 Ultra&quot; &#8212; a silver container marked &quot;container,&quot; a blue one marked &quot;docker,&quot; and an orange-to-pink gradient one marked &quot;orbstack.&quot; Title reads &quot;Apple container vs Docker.&quot;" title="Three corrugated shipping containers standing on an Apple Silicon chip labeled &quot;M3 Ultra&quot; &#8212; a silver container marked &quot;container,&quot; a blue one marked &quot;docker,&quot; and an orange-to-pink gradient one marked &quot;orbstack.&quot; Title reads &quot;Apple container vs Docker.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!fSjI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 424w, https://substackcdn.com/image/fetch/$s_!fSjI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 848w, https://substackcdn.com/image/fetch/$s_!fSjI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 1272w, https://substackcdn.com/image/fetch/$s_!fSjI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dfcff0-f2be-4218-a2de-29c46f00ca7f_2400x1260.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Three container runtimes, one Mac Studio. I benchmarked Apple `container`, Docker Desktop, and OrbStack on Apple Silicon&#8212;here&#8217;s where each one wins.</figcaption></figure></div><p>Apple's <code>container</code> CLI hit 1.0 this month, picked up 30,000 GitHub stars, and arrived with the usual promise: faster than Docker on Apple Silicon, lighter, and more native. It runs Linux containers in lightweight VMs, written in Swift, optimized for your M-series chip.</p><p>I wanted to know if the promise holds. So I didn't read the marketing. I benchmarked it.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;768faea8-23d2-438f-b80a-a7487cfcf4cc&quot;,&quot;caption&quot;:&quot;There's a number floating around that's hard to ignore.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Ran Google's 1,000-Tokens-Per-Second Model on My Mac. A Normal Model Beat It.&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-19T13:15:26.242Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!F3lP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:202624833,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>And I added a third contender Apple didn't mention: <strong>OrbStack</strong>. Because if you've already gotten tired of Docker Desktop on a Mac, OrbStack is probably what you switched to. Leaving it out of a "fast containers on Apple Silicon" comparison would've been lacking since it's the one a lot of us actually run.</p><p>Here's what the numbers say and how to decide which one belongs on your machine.</p><h2>How I tested this</h2><p>Quick note on method: because a benchmark you can't trust is just a screenshot of one lucky run.</p><p>Everything ran on a Mac Studio: M3 Ultra, 256 GB RAM, macOS 26.5.1. All three tools pulled the <em>same</em> images, pinned by digest, so nobody got a different build. Every measurement is the median of at least 10 runs, warmup discarded, each run in a clean subprocess so state from one couldn't bleed into the next.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7b69ba58-f3a3-4fb5-aea1-f595057e71b0&quot;,&quot;caption&quot;:&quot;\&quot;Isn't that expensive?\&quot; Every time I talk about using Claude Code for a project, someone asks this. The honest answer is: it depends entirely on what you route to Claude versus what you run locally. On a Mac Studio with unified memory, the economics change fast. Here's the routing table I actually use.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Local LLMs Plus Claude Code: The Mac Studio Hybrid Workflow&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-23T11:04:12.187Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YhR0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/local-llms-claude-code-mac-studio&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:199922338,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Nine scenarios: container startup, a web server coming up, Postgres accepting connections, CPU work, disk I/O on a bind mount, container-to-container networking, an image build, a multi-service stack, and a density ramp. Plus idle footprint.</p><p>The whole harness is public if you want to reproduce it or pick it apart: <a href="https://github.com/Jmeg8r/apple-container-vs-docker">github.com/Jmeg8r/apple-container-vs-docker</a>. Every claim below traces to a number in that repo or a failure I logged on purpose.</p><p>One thing that matters before we start: <strong>all of this is on macOS 26 (Tahoe).</strong> On macOS 15, Apple <code>container</code> can't even do container-to-container networking. Containers get IPs but can't talk to each other. If you're not on Tahoe yet, half of this comparison doesn't apply to you, and the answer is "stay on Docker or OrbStack." For everyone else, read on.</p><h2>The one number where Apple wins outright</h2><p>Let's lead with Apple's real victory, because it's a good one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yg7o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yg7o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!yg7o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!yg7o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!yg7o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yg7o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png" width="1120" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:1120,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38792,&quot;alt&quot;:&quot;Bar chart of idle memory footprint; Apple container near 51 MB, Docker Desktop about 1,124 MB, OrbStack about 1,631 MB.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/203788714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bar chart of idle memory footprint; Apple container near 51 MB, Docker Desktop about 1,124 MB, OrbStack about 1,631 MB." title="Bar chart of idle memory footprint; Apple container near 51 MB, Docker Desktop about 1,124 MB, OrbStack about 1,631 MB." srcset="https://substackcdn.com/image/fetch/$s_!yg7o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!yg7o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!yg7o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!yg7o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba6899b-ace1-4221-a5b6-931fc104ebf9_1120x672.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Idle memory with nothing running&#8212;Apple `container` ~51 MB vs Docker Desktop ~1,124 MB and OrbStack ~1,631 MB.</figcaption></figure></div><p><strong>Idle footprint.</strong> With nothing running&#8212;no containers, just the runtime sitting there ready. Apple <code>container</code> holds about <strong>51 MB</strong> of memory. Docker Desktop sits at <strong>1,124 MB</strong>. OrbStack at <strong>1,631 MB</strong>.</p><p>That's not a typo. Apple <code>container</code> uses 22 to 32 times less memory at rest than the other two.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The reason is architectural. Docker Desktop and OrbStack each run one big Linux VM that stays resident the whole time you're logged in, whether you're using it or not. Apple <code>container</code> doesn't keep a VM warm. It spins one up per container and tears it down when the container stops. Nothing running means nothing resident.</p><p>If you keep a couple of long-lived services on your laptop and want them out of the way the rest of the day, this is a genuinely meaningful win. Your RAM is yours again when you're not using containers.</p><p>Hold onto that "spins up a VM per container" detail, though. It's also the source of every place Apple loses.</p><h2>Where it ties and quietly wins again</h2><p>Before the losses, give Apple its due on two more.</p><p><strong>Raw CPU is a dead heat.</strong> Running sysbench inside a container, Apple <code>container</code> actually came out <em>slightly ahead</em>, about 38,500 events per second versus 36,400 for Docker Desktop and 33,500 for OrbStack. The per-VM model adds no real tax on compute. Once your code is running, it runs at full speed.</p><p><strong>Cached builds are Apple's fastest.</strong> Rebuild an image when the layers are already cached, and Apple finishes in about <strong>426 ms. </strong>Quicker than Docker Desktop (598 ms) and noticeably quicker than OrbStack (844 ms).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JCuu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JCuu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!JCuu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!JCuu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!JCuu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JCuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png" width="1120" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:1120,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33515,&quot;alt&quot;:&quot;Bar chart of cached image rebuild time in milliseconds; Apple container lowest near 426, Docker Desktop near 598, OrbStack near 844.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/203788714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bar chart of cached image rebuild time in milliseconds; Apple container lowest near 426, Docker Desktop near 598, OrbStack near 844." title="Bar chart of cached image rebuild time in milliseconds; Apple container lowest near 426, Docker Desktop near 598, OrbStack near 844." srcset="https://substackcdn.com/image/fetch/$s_!JCuu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!JCuu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!JCuu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!JCuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc31d7d6-580c-4ba1-bd5e-2984be3e4239_1120x672.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Cached image rebuild&#8212;Apple is fastest at ~426 ms vs. 598 (Docker Desktop) and 844 (OrbStack).</figcaption></figure></div><p>So this isn't a story about a slow tool. Where the VM boundary isn't sitting in the hot path, Apple <code>container</code> is right there with the best of them and sometimes ahead.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/apple-container-vs-docker-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/apple-container-vs-docker-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/apple-container-vs-docker-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><h2>The per-VM tax: everywhere it creates or moves data</h2><p>Now the other side. And it's consistent enough that you can predict it: <strong>anything that involves spinning up a container or pushing data across the VM boundary costs Apple time.</strong></p><p><strong>Startup.</strong> A bare container's start-to-exit takes Apple about <strong>1,071 ms</strong> versus ~360&#8211;395 ms for the others. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5EII!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5EII!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!5EII!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!5EII!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!5EII!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5EII!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png" width="1120" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:1120,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35554,&quot;alt&quot;:&quot;Bar chart of container start-to-exit time in milliseconds; Apple container around 1,071, Docker Desktop and OrbStack around 360 to 395.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/203788714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bar chart of container start-to-exit time in milliseconds; Apple container around 1,071, Docker Desktop and OrbStack around 360 to 395." title="Bar chart of container start-to-exit time in milliseconds; Apple container around 1,071, Docker Desktop and OrbStack around 360 to 395." srcset="https://substackcdn.com/image/fetch/$s_!5EII!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!5EII!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!5EII!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!5EII!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bc994e7-908a-419d-9afb-d4634a8c49d5_1120x672.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Bare container start-to-exit&#8212;Apple ~1,071 ms vs. ~360&#8211;395 ms for the others.</figcaption></figure></div><p>Nginx answering its first request: <strong>1,095 ms</strong> versus ~290 ms. Postgres accepting a connection: <strong>2,081 ms</strong> versus 292 ms on Docker Desktop. That's 3&#215; to 7&#215; slower, depending on the workload. Every time you start something, you're paying for a VM to boot.</p><p><strong>Bind-mount disk I/O&#8212;this is the one that'll bite you.</strong> Mount a folder from your Mac into the container and write to it, and Apple manages about <strong>32.6 MB/s</strong>. Docker Desktop does 96.7. OrbStack does <strong>548.5</strong> MB/s which is roughly 17 times faster than Apple.</p><p>Sit with that one, because it's not an abstract benchmark. Mounting your source tree into a container and editing it live. The hot-reload dev loop, the thing a huge number of Mac developers do all day, runs straight through this path. On Apple <code>container</code>, that loop is going to feel like wading through mud.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SLaW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SLaW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!SLaW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!SLaW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!SLaW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SLaW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png" width="1120" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38edb49e-b612-453d-a863-80814f2e8180_1120x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:1120,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36328,&quot;alt&quot;:&quot;Bar chart of bind-mount write throughput in MB/s; Apple lowest near 32.6, Docker Desktop mid near 96.7, OrbStack highest near 548.5.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/203788714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bar chart of bind-mount write throughput in MB/s; Apple lowest near 32.6, Docker Desktop mid near 96.7, OrbStack highest near 548.5." title="Bar chart of bind-mount write throughput in MB/s; Apple lowest near 32.6, Docker Desktop mid near 96.7, OrbStack highest near 548.5." srcset="https://substackcdn.com/image/fetch/$s_!SLaW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 424w, https://substackcdn.com/image/fetch/$s_!SLaW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 848w, https://substackcdn.com/image/fetch/$s_!SLaW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 1272w, https://substackcdn.com/image/fetch/$s_!SLaW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38edb49e-b612-453d-a863-80814f2e8180_1120x672.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Bind-mount write throughput&#8212;Apple ~32.6 MB/s vs. Docker Desktop 96.7 and OrbStack 548.5 (~17&#215; faster than Apple).</figcaption></figure></div><p><strong>Networking between containers</strong> follows the same shape: ~11 Gbps for Apple, ~53 for Docker Desktop, and ~85 for OrbStack. Real network hops between separate VMs versus traffic staying inside one shared VM.</p><p><strong>And density.</strong> Spinning up 40 containers took Apple about 35 seconds. Docker Desktop and OrbStack did it in roughly 9. Forty VMs versus forty processes in one VM&#8212;the model shows up again. (To be fair, none of the three actually <em>fell over</em> at 40; I capped it there to be kind to my machine. Finding Apple's true ceiling is a job for another day.)</p><p>None of this is a bug. It's the direct, honest consequence of one lightweight VM per container. <strong>You're trading creation speed and I/O for isolation and a clean idle state.</strong> Whether that's a good trade depends entirely on what you do all day.</p><h2>The gaps that have nothing to do with speed</h2><p>Numbers aside, a few things will just stop you cold, and these matter more for daily use than any millisecond.</p><p><strong>No Docker Compose.</strong> This is the big one. If your project starts with <code>docker compose up</code>, Apple <code>container</code> has no native answer. You're back to creating a network and running each service by hand, or leaning on an unofficial third-party bridge. Docker Desktop and OrbStack both speak Compose fluently.</p><p><strong>Image names have to be fully qualified.</strong> <code>container run alpine</code> fails. You need <code>container run docker.io/library/alpine</code>. Small thing, but it'll trip every script and muscle-memory habit you have.</p><p><strong>Port publishing isn't the model you know.</strong> I expected <code>-p 8080:80</code> to put the service on <code>localhost:8080</code> like Docker does. On Apple <code>container</code> it didn&#8217;t&#8212;the container was serving fine, but on its <em>own</em> IP address, not on localhost. Apple's model is "every container gets a real routable IP," which is elegant, but it's not the muscle memory you've built. (Related gotcha: poll <code>127.0.0.1</code>, not <code>localhost</code> &#8212; <code>localhost</code> can resolve to IPv6 first and miss the service entirely.)</p><p><strong>DevContainers support is incomplete.</strong> If you live in VS Code's dev containers, you're not there yet.</p><h2>If you want to try it yourself</h2><p>It's a five-minute install. The one non-obvious step is the kernel.</p><pre><code># Apple container
brew install container
container system kernel set --recommended   # do this first, or `start` prompts you
container system start

# OrbStack, if you want to compare
brew install --cask orbstack</code></pre><p>Then remember the gotchas above: fully-qualified image names, reach services by their container IP, and <code>127.0.0.1</code> over <code>localhost</code>. That's most of the friction right there.</p><h2>So which one do you actually use?</h2><p>Here's how I'd decide.</p><p><strong>Reach for Apple `container`</strong> when you run a handful of long-lived containers and you care about isolation and a clean idle footprint&#8212;a couple of always-on services on a laptop you also use for everything else. The per-container VM boundary is real security isolation, and getting your RAM back when you're idle is a nice perk. Just don't point your live-editing dev loop at it.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c6f1521b-614a-4e96-9cda-492ddc8332d0&quot;,&quot;caption&quot;:&quot;I have an autonomous AI agent running on my Mac Studio. It has full shell access, reads my calendar, manages my tasks, and sends iMessages on my behalf. It runs 24/7 as a background service.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Secured My AI Agent With a 7-Layer Threat Model&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-08T16:31:12.687Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!IkGX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/secured-ai-agent-7-layer-threat-model&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:201130607,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Reach for OrbStack</strong> if you want fast Docker without the friction. It won every speed test that involved I/O or networking, starts containers as fast as anything, and crucially, it still speaks Compose and the full <code>docker</code> CLI. The only price is the biggest idle footprint of the three and the slowest cached build. For most Mac developers, this is the easy pick.</p><p><strong>Stay on Docker Desktop</strong> when you need the whole ecosystem&#8212;Compose, DevContainers, and the broadest tooling and documentation. It sat in the middle of nearly every benchmark, and "middle of the pack with everything supported" is exactly what a default should be.</p><p>The headline I came in expecting &#8212; "Apple <code>container</code> is faster than Docker" &#8212; just isn't what the data shows. It's faster at a few specific things and slower at most of the rest. But that's not a knock. It's a <em>specialist</em>. It does one shape of work really well and asks you to give up the conveniences you've built your workflow around.</p><p>Know which shape of work you're doing, and the choice makes itself.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/apple-container-vs-docker-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/apple-container-vs-docker-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><p><em>The full harness, raw numbers, and every chart are public: </em><a href="https://github.com/Jmeg8r/apple-container-vs-docker">github.com/Jmeg8r/apple-container-vs-docker</a><em> Run it on your own machine. I'd genuinely like to see whether the bind-mount gap holds on an M1 or M2, or whether 256 GB of headroom is flattering somebody.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/apple-container-vs-docker-apple-silicon/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/apple-container-vs-docker-apple-silicon/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Local LLMs Plus Claude Code: The Mac Studio Hybrid Workflow]]></title><description><![CDATA[Route generation to local Ollama models (gemma4, qwen3-coder) and save Claude for judgment calls. The routing table I run on a 256GB M3 Ultra Mac Studio.]]></description><link>https://astgl.com/p/local-llms-claude-code-mac-studio</link><guid isPermaLink="false">https://astgl.com/p/local-llms-claude-code-mac-studio</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Tue, 23 Jun 2026 11:04:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YhR0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YhR0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YhR0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 424w, https://substackcdn.com/image/fetch/$s_!YhR0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 848w, https://substackcdn.com/image/fetch/$s_!YhR0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 1272w, https://substackcdn.com/image/fetch/$s_!YhR0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YhR0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png" width="1456" height="556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129124,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922338?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YhR0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 424w, https://substackcdn.com/image/fetch/$s_!YhR0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 848w, https://substackcdn.com/image/fetch/$s_!YhR0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 1272w, https://substackcdn.com/image/fetch/$s_!YhR0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d3e533-301e-4a79-a8b7-b259a76c83ec_2320x886.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>"Isn't that expensive?" Every time I talk about using Claude Code for a project, someone asks this. The honest answer is: it depends entirely on what you route to Claude versus what you run locally. On a Mac Studio with unified memory, the economics change fast. Here's the routing table I actually use.</p><div><hr></div><h2>The Setup</h2><p>I have an M3 Ultra with 256 GB unified memory. That machine runs a 70B model locally with room to spare and a 235B mixture-of-experts model when I need frontier-scale reasoning. Running it at load draws around 60 watts. Eight hours of overnight inference work costs about five cents in electricity at typical US rates. The same run on a Frontier API would cost somewhere between fifteen and thirty dollars.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6c2374d1-2cbe-4120-ab01-14806a1b2a06&quot;,&quot;caption&quot;:&quot;Running LLMs locally usually feels like a compromise. You either get tiny, fast models that can't think or massive models that crawl at one word per minute. But with the right hardware, you can break that trade-off and replace your cloud billing entirely.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Stop Paying for Cloud APIs: Building a Local AI Stack on Mac Studio&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-01T17:08:10.216Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/119a1a3d-31a2-4bac-9f7b-23880a131212_2352x882.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:199922294,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>But cost isn't the primary reason to run locally. Latency is. My workhorse model, <code>gemma4:31b-mlx</code>, stays pinned in GPU memory, so there's no cold-start delay. It begins responding the moment you hit enter on M-series hardware. That's interactive. You can iterate on code at conversational speed with a local model, then bring Claude in for the decisions that actually require judgment.</p><p>Most developers running Claude Code for everything are paying for two things: generation (which local models handle well) and judgment (which frontier models handle better). Splitting those tasks cuts the API spend dramatically while keeping the quality where it matters.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What's Actually Going On</h2><p>The routing principle has one line: <em>"Claude reads the playbook. Local models do the work."</em></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;42226f64-020a-4205-9467-ae87622a7100&quot;,&quot;caption&quot;:&quot;The question isn't whether local AI saves money&#8212;it does. The question is how fast and how much, based on your specific usage pattern.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What's the ROI of Local AI Infrastructure?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-13T03:29:13.208Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZKWP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194024691,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><a href="https://tools.astgl.ai/use-cases/llm-evaluation">Local models handle generation</a>: the fast, cheap, iterative part. New code, first-draft text, refactoring suggestions, variant generation. Claude Code handles orchestration and the Compound step: planning, judging what goes in <code>learnings.jsonl</code>, reviewing the session output, and updating CLAUDE.md.</p><p>The model routing table I use:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2sY3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2sY3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2sY3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2sY3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2sY3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2sY3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2sY3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2sY3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2sY3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2sY3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dc68ef7-b5c7-4119-9a3f-1f8382135891_2288x1078.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The everyday models are light. <code>gemma4:31b-mlx</code> loads in about 20 GB on disk and sits pinned in GPU memory (around 46 GB resident at its full 256K context). <code>qwen3-coder:30b</code>, a mixture-of-experts model that activates only ~3B parameters per token, loads in another 18 GB. On an M3 Ultra with 256 GB, both stay resident at once with enormous headroom enough to also pull a 120B or even a 235B reasoning model on demand. That last one is the whole point of the memory: a 235B model at Q4 needs around 142 GB and simply won't load on a smaller machine.</p><p><code>deepseek-r1:70b</code> needs 64 GB or more. It's not interactive, responding in 20-30 seconds per generation. That's fine for overnight batch jobs. It's not fine for a code review you're waiting on.</p><h2>The Fix</h2><p>Setting up Ollama on a Mac Studio takes about 10 minutes:</p><pre><code>brew install ollama
ollama pull gemma4:31b-mlx     # primary workhorse (MLX build, Apple-Silicon optimized)
ollama pull qwen3-coder:30b    # code generation + review (MoE, strong tool calling)
# Heavy reasoning &#8212; only comfortable with lots of unified memory:
# ollama pull deepseek-r1:70b                          # ~42 GB, overnight reasoning
# ollama pull gpt-oss:120b                             # ~65 GB
# ollama pull qwen3:235b-a22b-thinking-2507-q4_K_M     # ~142 GB, needs 256 GB</code></pre><p>Open the Ollama menu bar app once to enable auto-start on login. It runs as a local HTTP server at <code>localhost:11434</code>.</p><p>The session structure for a typical day:</p><pre><code>Morning: Planning (Claude Code)
  - Read CLAUDE.md + learnings.jsonl
  - Brainstorm, write plan with verify steps

Work (local models via Ollama)
  - gemma4:31b-mlx for generation and first drafts
  - qwen3-coder:30b for code and "catch what I missed"

Evening: Compound Step (Claude Code)
  - Review session, propose learnings
  - Update CLAUDE.md Known Patterns table
  - Check tests, commit

Overnight (optional, if you have enough RAM for a 70B model):
  - deepseek-r1:70b running autoresearch loop
    on a skill file or prompt variant</code></pre><p>The Compound step is where the two frameworks connect. Karpathy's Autoresearch pattern (one file, one metric, keep/revert with git) maps directly onto a local overnight job. Instead of optimizing a training script, you optimize a skill file or a prompt file from your <code>.claude/commands/</code> directory.</p><p>Define a set of test cases: files with known issues your skill should catch. Run variants overnight. Keep the variants that find more issues; revert the rest. In the morning, run the Compound step with Claude on the winning variants.</p><p>The autoresearch loop becomes the Work step in a Compound Engineering session.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9W_y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9W_y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 424w, https://substackcdn.com/image/fetch/$s_!9W_y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 848w, https://substackcdn.com/image/fetch/$s_!9W_y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 1272w, https://substackcdn.com/image/fetch/$s_!9W_y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9W_y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png" width="1456" height="1644" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1644,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131580,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922338?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9W_y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 424w, https://substackcdn.com/image/fetch/$s_!9W_y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 848w, https://substackcdn.com/image/fetch/$s_!9W_y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 1272w, https://substackcdn.com/image/fetch/$s_!9W_y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f1d5b-f3c3-4630-bc73-87316c881243_1568x1770.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why This Matters</h2><p>I've had a real-world data point on this from the stoicism-agent project, where I ran the first overnight autoresearch loop using local models. The learning I wrote after that run:</p><blockquote><p><em>"The fork between MLX and Ollama for autoresearch comes down to one question: do you need to modify model weights? MLX if yes. Ollama for prompt-level optimization. For skill file autoresearch, Ollama is the right choice."</em></p></blockquote><p>That's the kind of learning you only get by running the thing. The instinct before running it was to reach for MLX because it's the "native" Mac AI framework. The result after running it: Ollama is simpler, has better model selection, and is more than fast enough for overnight prompt optimization where you're not touching weights.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;37b512fc-1663-4543-87e7-ada4e2a08fa7&quot;,&quot;caption&quot;:&quot;I went to sleep. My Mac ran 118 experiments. When I woke up, a small GPT had trained itself from `val_bpb` 1.563 down to 1.289, beating every documented Apple Silicon overnight run in the project's public README. I wrote no code overnight. I just left a Claude Code session running against a markdown file named `program.md`, and the agent did the rest.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Nightshift: I Went to Sleep and My Mac Ran 118 Experiments&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-22T19:00:22.717Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!QI8z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:195033133,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Same principle scales to any overnight optimization loop. Pick the tool that matches the task. Don't reach for fine-tuning when prompt-level optimization will do.</p><p>The cost math on the Mac Studio, after a few months of this workflow: roughly 90% of generation work routes to local models. The 10% that goes to Claude is the judgment layer: planning, compound steps, final review. Total Claude API spend for a typical weekend session runs $2-3. For a full week of this, maybe $8-10.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mDq8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mDq8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 424w, https://substackcdn.com/image/fetch/$s_!mDq8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 848w, https://substackcdn.com/image/fetch/$s_!mDq8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 1272w, https://substackcdn.com/image/fetch/$s_!mDq8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mDq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png" width="1456" height="551" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:551,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92682,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922338?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mDq8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 424w, https://substackcdn.com/image/fetch/$s_!mDq8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 848w, https://substackcdn.com/image/fetch/$s_!mDq8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 1272w, https://substackcdn.com/image/fetch/$s_!mDq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce932ed6-ad11-4d4a-abc7-3fbc19a8f96c_2160x818.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Mac Studio has nearly paid for itself in API savings after about 7 months. More importantly, the learnings file from six months of consistent Compound Engineering is now the most valuable artifact in my projects. Not the code. The 200+ preserved decisions.</p><p>Code can be rewritten. Those decisions can't.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/local-llms-claude-code-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/local-llms-claude-code-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Quick Reference</h2><ul><li><p><strong>Install:</strong> <code>brew install ollama</code>, open Ollama.app for auto-start</p></li><li><p><strong>Primary workhorse:</strong> <code>gemma4:31b-mlx</code> (~20 GB, pinned in GPU, MLX-optimized)</p></li><li><p><strong>Code generation + review:</strong> <code>qwen3-coder:30b</code> (~18 GB, MoE)</p></li><li><p><strong>Heavy reasoning:</strong> <code>gpt-oss:120b</code> (~65 GB) or <code>qwen3:235b-a22b-thinking</code> (~142 GB, needs 256 GB)</p></li><li><p><strong>Overnight reasoning:</strong> <code>deepseek-r1:70b</code> (needs 64 GB+, ~20-30s/response)</p></li><li><p><strong>Never route to local:</strong> Compound step, final review, architectural decisions</p></li><li><p><strong>Autoresearch overnight:</strong> use <code>deepseek-r1:70b</code> for skill/prompt optimization, not weight training</p></li><li><p><strong>Also resident:</strong> <code>nomic-embed-text</code> for embeddings, a custom voice model for narration</p></li><li><p><strong>Smaller machines:</strong> a 76 GB M2 Ultra runs <code>gemma4:31b-mlx</code> + <code>qwen3-coder:30b</code> fine, but the 120B&#8211;235B tier needs the 256 GB</p></li><li><p><strong>Full routing table:</strong> <code>docs/local-llm-routing.md</code> in the template repo</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ff82fae9-e10d-4525-8723-c6894497b8b4&quot;,&quot;caption&quot;:&quot;Your background agents are about to run out of money. Anthropic's new credit pool system means your automation could die in a single week. Here is how I re-engineered my stack to stay under budget without breaking my workflows.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Managing Anthropic Agent SDK Costs: A Post-June 15 Billing Playbook&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-16T20:48:17.274Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7dfdca46-ce32-48aa-9ef7-3e1c70adb3f5_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/anthropic-agent-sdk-billing-playbook&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:198010392,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li></ul><div><hr></div><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.substack.com">As The Geek Learns</a> (<a href="https://astgl.substack.com">https://astgl.substack.com</a>)</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/local-llms-claude-code-mac-studio/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/local-llms-claude-code-mac-studio/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[SEO Isn't Dead — But the Click Might Be]]></title><description><![CDATA[Listen now | SEO and AEO explained: Explore how AI overviews drive zero-click searches and why being cited by AI assistants is the new way to capture high-converting&#8230;]]></description><link>https://astgl.com/p/seo-isnt-dead-but-the-click-might-be</link><guid isPermaLink="false">https://astgl.com/p/seo-isnt-dead-but-the-click-might-be</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 22 Jun 2026 11:03:39 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/202781723/49c89296d3b00e66870f32f7b941366f.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p></p>]]></content:encoded></item><item><title><![CDATA[Local vs. Frontier Models: Has the Gap Closed?]]></title><description><![CDATA[Listen now | Local AI models vs frontier APIs: explore the cost, privacy, and performance gaps in reasoning and agentic tasks for enterprise workloads.]]></description><link>https://astgl.com/p/local-vs-frontier-models-has-the-gap-closed</link><guid isPermaLink="false">https://astgl.com/p/local-vs-frontier-models-has-the-gap-closed</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Sat, 20 Jun 2026 00:01:04 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/202785170/07fb2a7bdcf6d95857bf709d3f731dd3.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p></p>]]></content:encoded></item><item><title><![CDATA[I Ran Google's 1,000-Tokens-Per-Second Model on My Mac. A Normal Model Beat It.]]></title><description><![CDATA[DiffusionGemma promises 1,000 tok/s but hits 43 on Mac. See why autoregressive Gemma wins on Apple Silicon with real benchmarks and data.]]></description><link>https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon</link><guid isPermaLink="false">https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Fri, 19 Jun 2026 13:15:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!F3lP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F3lP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F3lP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 424w, https://substackcdn.com/image/fetch/$s_!F3lP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 848w, https://substackcdn.com/image/fetch/$s_!F3lP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!F3lP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F3lP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87403,&quot;alt&quot;:&quot;On Apple Silicon with 8-bit quantization, autoregressive Gemma 4 26B runs at 61 tok/s, beating DiffusionGemma's 43 tok/s. The diffusion model, marketed for 1,000+ tok/s, is slower here.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="On Apple Silicon with 8-bit quantization, autoregressive Gemma 4 26B runs at 61 tok/s, beating DiffusionGemma's 43 tok/s. The diffusion model, marketed for 1,000+ tok/s, is slower here." title="On Apple Silicon with 8-bit quantization, autoregressive Gemma 4 26B runs at 61 tok/s, beating DiffusionGemma's 43 tok/s. The diffusion model, marketed for 1,000+ tok/s, is slower here." srcset="https://substackcdn.com/image/fetch/$s_!F3lP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 424w, https://substackcdn.com/image/fetch/$s_!F3lP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 848w, https://substackcdn.com/image/fetch/$s_!F3lP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!F3lP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc46aae19-4a9c-4f7e-9e3e-0b4b50fb7464_1800x1000.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There's a number floating around that's hard to ignore.</p><p>Google's new DiffusionGemma is supposed to crank out <strong>1,000-plus tokens per second</strong>. For comparison, the local models most of us run putter along at 30 to 100. So a 10x jump? That gets my attention.</p><p>The reason it's fast is genuinely interesting. Every model you've used&#8212;ChatGPT, Claude, your local Llama&#8212;writes one token at a time, left to right, each word waiting on the one before it. That's "autoregressive." Diffusion models work completely differently. They start with a blank canvas of 256 tokens and refine the whole block at once, in parallel, like a photo developing. No waiting in line.</p><p>On paper, that's the future. So I did the obvious thing: I ran it on my Mac Studio to see if the future had arrived on my desk.</p><p>It hadn't. And the <em>way</em> it hadn't turned out to be more interesting than a win would've been.</p><h2>The fair fight</h2><p>Here's the thing that makes this a clean test instead of a vibe check.</p><p>DiffusionGemma is built on the same bones as Gemma 4&#8212;Google ships an autoregressive <strong>Gemma 4 26B A4B</strong> that's the same size, same architecture, same weights. The <em>only</em> difference is how it generates: diffusion vs. one token at a time.</p><p>So I put them head to head. Same Mac. Same 8-bit quantization. Same runner (<em><strong>Apple's MLX</strong></em>). Same prompts, same everything. The one variable left standing is the decoding paradigm itself. If diffusion is faster, this proves it. If it's not, there's nowhere to hide.</p><p>Thirty prompts across code, math, instruction-following, and writing. Five runs each. Let's look.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The result nobody puts in the headline</h2><p>DiffusionGemma did about <strong>43 tokens per second</strong> on my Mac.</p><p>The boring old autoregressive model did <strong>61</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u_qp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u_qp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 424w, https://substackcdn.com/image/fetch/$s_!u_qp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 848w, https://substackcdn.com/image/fetch/$s_!u_qp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 1272w, https://substackcdn.com/image/fetch/$s_!u_qp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u_qp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png" width="910" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:910,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39277,&quot;alt&quot;:&quot;For 512-token generations on Apple Silicon 8-bit, autoregressive Gemma 4 is about 40% faster at 61 tok/s compared to DiffusionGemma's 43 tok/s. DiffusionGemma shows much wider run-to-run variance.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="For 512-token generations on Apple Silicon 8-bit, autoregressive Gemma 4 is about 40% faster at 61 tok/s compared to DiffusionGemma's 43 tok/s. DiffusionGemma shows much wider run-to-run variance." title="For 512-token generations on Apple Silicon 8-bit, autoregressive Gemma 4 is about 40% faster at 61 tok/s compared to DiffusionGemma's 43 tok/s. DiffusionGemma shows much wider run-to-run variance." srcset="https://substackcdn.com/image/fetch/$s_!u_qp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 424w, https://substackcdn.com/image/fetch/$s_!u_qp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 848w, https://substackcdn.com/image/fetch/$s_!u_qp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 1272w, https://substackcdn.com/image/fetch/$s_!u_qp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d55b2f0-a80b-4bec-92bc-b0121fbc07d7_910x585.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The diffusion model, the one that does 1,000+ on a datacenter GPU, was the <em>slower</em> of the two on Apple Silicon. Not by a hair. By 40%.</p><p>That gap between the headline and my desk is about 23x. The 1,000 tok/s is real; it's just real on an H100, a $30,000 datacenter card. On a Mac, that number has nothing to do with your life.</p><p>And it gets worse for diffusion if you care about how snappy a chat feels. There's a metric called time-to-first-token, how long you stare at a blank screen before words start appearing. The autoregressive model started typing in <strong>0.12 seconds</strong>. DiffusionGemma took <strong>1.86</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dTZp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dTZp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 424w, https://substackcdn.com/image/fetch/$s_!dTZp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 848w, https://substackcdn.com/image/fetch/$s_!dTZp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 1272w, https://substackcdn.com/image/fetch/$s_!dTZp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dTZp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png" width="910" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:910,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37496,&quot;alt&quot;:&quot;Time to first token on Mac Studio. Autoregressive Gemma starts in 0.12 s, while DiffusionGemma takes 1.86 s because it must refine a full 256-token block before outputting.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Time to first token on Mac Studio. Autoregressive Gemma starts in 0.12 s, while DiffusionGemma takes 1.86 s because it must refine a full 256-token block before outputting." title="Time to first token on Mac Studio. Autoregressive Gemma starts in 0.12 s, while DiffusionGemma takes 1.86 s because it must refine a full 256-token block before outputting." srcset="https://substackcdn.com/image/fetch/$s_!dTZp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 424w, https://substackcdn.com/image/fetch/$s_!dTZp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 848w, https://substackcdn.com/image/fetch/$s_!dTZp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 1272w, https://substackcdn.com/image/fetch/$s_!dTZp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e92935b-8dc1-4ab0-8cda-b8b171523e45_910x585.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That one surprised me until I thought about it. Remember how diffusion refines a whole 256-token block at once? That's the catch; it can't show you <em>anything</em> until the entire block is done cooking. The "parallel" model that's supposed to feel instant actually feels laggier, because it makes you wait for the batch.</p><h2>Why the magic doesn't travel</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!98hE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!98hE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 424w, https://substackcdn.com/image/fetch/$s_!98hE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 848w, https://substackcdn.com/image/fetch/$s_!98hE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 1272w, https://substackcdn.com/image/fetch/$s_!98hE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!98hE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png" width="1456" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91614,&quot;alt&quot;:&quot;Mac Studio MLX 8-bit results. Gemma 4 26B leads with 61 tok/s throughput, 0.12 s TTFT, and 0.90 quality. DiffusionGemma has 43 tok/s, 1.86 s TTFT, and 0.84 quality, despite reported h100 speeds of 1,000+ tok/s.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Mac Studio MLX 8-bit results. Gemma 4 26B leads with 61 tok/s throughput, 0.12 s TTFT, and 0.90 quality. DiffusionGemma has 43 tok/s, 1.86 s TTFT, and 0.84 quality, despite reported h100 speeds of 1,000+ tok/s." title="Mac Studio MLX 8-bit results. Gemma 4 26B leads with 61 tok/s throughput, 0.12 s TTFT, and 0.90 quality. DiffusionGemma has 43 tok/s, 1.86 s TTFT, and 0.84 quality, despite reported h100 speeds of 1,000+ tok/s." srcset="https://substackcdn.com/image/fetch/$s_!98hE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 424w, https://substackcdn.com/image/fetch/$s_!98hE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 848w, https://substackcdn.com/image/fetch/$s_!98hE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 1272w, https://substackcdn.com/image/fetch/$s_!98hE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e395ccf-8cee-4260-84d7-0d5798358f49_2000x855.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So why does the same model fly on an H100 and crawl on a Mac?</p><p>It comes down to what each machine is good at. Diffusion's whole speed trick is doing a giant pile of math all at once, refining 256 tokens in parallel. An H100 has thousands of cores sitting there begging for exactly that kind of bulk work. Flood it, and it's happy.</p><p>Apple Silicon doesn't win that way. It's not short on memory. My Mac Studio has 256GB, but it's limited by how fast it can move data around, not how much math it can do at once. The fancy parallel block doesn't help when the bottleneck is the plumbing, not the engine.</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;77ede3e7-c8ef-4ae0-bd1b-56acf33999f4&quot;,&quot;caption&quot;:&quot;Running LLMs locally usually feels like a compromise. You either get tiny, fast models that can't think or massive models that crawl at one word per minute. But with the right hardware, you can break that trade-off and replace your cloud billing entirely.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Stop Paying for Cloud APIs: Building a Local AI Stack on Mac Studio&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-01T17:08:10.216Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/119a1a3d-31a2-4bac-9f7b-23880a131212_2352x882.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:199922294,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p>Autoregressive decoding, meanwhile, plays to the Mac's strengths. It reuses its previous work (a "KV cache") and touches way less memory per token. Same model, same weights; the architecture that wins in the datacenter loses on the desktop. The hardware decides.</p><h2>The benchmark that kept slowing itself down</h2><p>I almost shipped wrong numbers. Here's the part the polished write-ups leave out.</p><p>My first full run looked fine for the first 15 or so generations. Then DiffusionGemma started... degrading. Not crashing&#8212;slowing. Time-to-first-token climbed from 1 second to 2, then 4, then 18, then 60, and by the 28th generation, a single response took over <strong>two minutes</strong>. Same prompt that was instant a minute earlier.</p><p>My first guess was a memory leak. So I checked. And this is the maddening part: every memory counter the framework reports stayed <em>flat</em>. By the numbers, nothing was wrong. I added the standard "clear the cache between runs" call. No change. I added a "wait for the GPU to finish" call. It nudged the cliff from generation 18 to generation 19 and then fell off it anyway.</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7bc8a34a-7b02-4635-907f-48aa27557a64&quot;,&quot;caption&quot;:&quot;3 a.m. Every cron job on the Mac Studio failed inside the same 90-second window. No code changes. No model updates. No new jobs. Just a wall of timeout errors that lit up every channel I had wired to alerts. The culprit was hiding in plain sight: a fallback chain doing exactly what I told it to.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Ollama Model-Swap Death Spiral That Killed Every Cron at Once&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-06T13:03:19.842Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!SzwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/ollama-model-swap-death-spiral&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194863944,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p>The culprit turned out to be the graphics driver itself quietly piling up state that none of the normal tools could see or clear. The only thing that actually worked was brute force: run a handful of generations, then kill the whole process and start fresh. Let the operating system clean up what the framework couldn't.</p><p>Two lessons I'm keeping:</p><p><strong>Trust the measurement over the marketing and over your own assumptions.</strong> If I'd run 15 prompts and called it a day, I'd have published a number that looked great and was completely fake.</p><p><strong>Your tools can lie by omission.</strong> "Memory usage is flat" is not the same as "nothing is accumulating." The dashboard being green doesn't mean the system is healthy.</p><h2>Quality: closer, but autoregressive still edges it</h2><p>Speed isn't everything, so I scored the actual answers too. Code got run and tested. Math got checked against the right answer. Instruction-following got graded against rules. Writing got judged blind.</p><p>Overall, autoregressive Gemma came out ahead&#8212;0.90 to 0.84&#8212;winning code, instructions, and writing, while DiffusionGemma edged it on math. Honestly, that tracks with Google's own advice, which quietly tells you to use the standard model "for maximum quality." It's a real gap, but it's not a blowout.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M2ls!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M2ls!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 424w, https://substackcdn.com/image/fetch/$s_!M2ls!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 848w, https://substackcdn.com/image/fetch/$s_!M2ls!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 1272w, https://substackcdn.com/image/fetch/$s_!M2ls!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M2ls!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png" width="1040" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:1040,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37054,&quot;alt&quot;:&quot;Quality scores by task. Autoregressive Gemma leads in code (100% vs 75%), instructions (100% vs 95%), and writing (86% vs 77%). DiffusionGemma leads in math (88% vs 75%).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quality scores by task. Autoregressive Gemma leads in code (100% vs 75%), instructions (100% vs 95%), and writing (86% vs 77%). DiffusionGemma leads in math (88% vs 75%)." title="Quality scores by task. Autoregressive Gemma leads in code (100% vs 75%), instructions (100% vs 95%), and writing (86% vs 77%). DiffusionGemma leads in math (88% vs 75%)." srcset="https://substackcdn.com/image/fetch/$s_!M2ls!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 424w, https://substackcdn.com/image/fetch/$s_!M2ls!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 848w, https://substackcdn.com/image/fetch/$s_!M2ls!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 1272w, https://substackcdn.com/image/fetch/$s_!M2ls!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681ee7c-b40e-4eae-a397-d64a51e36049_1040x585.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So the diffusion model on my Mac was slower, laggier, <em>and</em> a notch lower quality. That's not a trade-off. That's just losing.</p><h2>So should you care?</h2><p>If you run models locally on a Mac, here's the takeaway: <strong>don't reach for DiffusionGemma expecting the headline.</strong> You'll get a third of the speed of the autoregressive version, worse responsiveness, and slightly weaker answers. For Mac local inference, boring old next-token generation still wins.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cu3e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cu3e!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 424w, https://substackcdn.com/image/fetch/$s_!cu3e!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 848w, https://substackcdn.com/image/fetch/$s_!cu3e!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 1272w, https://substackcdn.com/image/fetch/$s_!cu3e!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cu3e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png" width="975" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:975,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69067,&quot;alt&quot;:&quot;DiffusionGemma performance from 8 to 48 denoising steps. Throughput drops from 48 tok/s to 38 tok/s. Accuracy peaks near 100% at 16 steps, then falls to 78% by 48 steps.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="DiffusionGemma performance from 8 to 48 denoising steps. Throughput drops from 48 tok/s to 38 tok/s. Accuracy peaks near 100% at 16 steps, then falls to 78% by 48 steps." title="DiffusionGemma performance from 8 to 48 denoising steps. Throughput drops from 48 tok/s to 38 tok/s. Accuracy peaks near 100% at 16 steps, then falls to 78% by 48 steps." srcset="https://substackcdn.com/image/fetch/$s_!cu3e!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 424w, https://substackcdn.com/image/fetch/$s_!cu3e!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 848w, https://substackcdn.com/image/fetch/$s_!cu3e!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 1272w, https://substackcdn.com/image/fetch/$s_!cu3e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156160e3-84c7-4fb5-a5ea-747d928fce0a_975x585.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That's not a knock on diffusion models. The approach is genuinely promising, and on the right hardware, it's a rocket. But "the right hardware" is an NVIDIA datacenter card right now, not the machine on your desk.</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b0a2a1e2-7024-4949-be3d-4909afc51a6c&quot;,&quot;caption&quot;:&quot;Not every task needs the biggest model. A 4-billion parameter model can sort your notifications just as well as a 70-billion parameter one&#8212;and it'll do it 10x faster.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What's the Best Local LLM for Your Specific Task?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77b317fc-ce3d-4e9d-8a88-a0059f468191_512x512.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-13T02:16:10.832Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!POvV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d123bd-13cc-4d2b-90c3-6a17464b681e_2368x642.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/whats-the-best-local-llm-for-your&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194024608,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p>The bigger lesson is the one I keep relearning: <strong>a vendor benchmark is true and useless until you run it on your own hardware.</strong> 1,000 tokens per second was a real number that told me nothing about my Mac. The only way to know what a tool does for <em>you</em> is to point it at <em>your</em> setup and watch.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TGNe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TGNe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 424w, https://substackcdn.com/image/fetch/$s_!TGNe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 848w, https://substackcdn.com/image/fetch/$s_!TGNe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 1272w, https://substackcdn.com/image/fetch/$s_!TGNe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TGNe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png" width="975" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90866cea-21da-44f2-8294-8db310822063_975x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:975,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44225,&quot;alt&quot;:&quot;DiffusionGemma throughput gaps. Vendor reports show 1,008 tok/s on h100 and 700 tok/s on RTX 5090, but measured performance on Mac Studio is only 43 tok/s, a 23x difference from the headline.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/202624833?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="DiffusionGemma throughput gaps. Vendor reports show 1,008 tok/s on h100 and 700 tok/s on RTX 5090, but measured performance on Mac Studio is only 43 tok/s, a 23x difference from the headline." title="DiffusionGemma throughput gaps. Vendor reports show 1,008 tok/s on h100 and 700 tok/s on RTX 5090, but measured performance on Mac Studio is only 43 tok/s, a 23x difference from the headline." srcset="https://substackcdn.com/image/fetch/$s_!TGNe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 424w, https://substackcdn.com/image/fetch/$s_!TGNe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 848w, https://substackcdn.com/image/fetch/$s_!TGNe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 1272w, https://substackcdn.com/image/fetch/$s_!TGNe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90866cea-21da-44f2-8294-8db310822063_975x585.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I built the whole benchmark as a reusable harness; same-architecture comparison, real scoring, the works, so when MLX gets a diffusion-optimized path (or llama.cpp's Metal support lands), I can re-run it in an afternoon and see if the story's changed. I suspect it will, eventually. Just not today.</p><p>The whole thing&#8212;harness, scorers, charts, and the raw results&#8212;is on GitHub if you want to poke at it or run it on your own machine: <strong><a href="https://github.com/Jmeg8r/diffusiongemma-benchmark">github.com/Jmeg8r/diffusiongemma-benchmark</a>.</strong></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><h2>Frequently Asked Questions</h2><h4>Does DiffusionGemma actually hit 1,000 tok/s on a Mac?</h4><p>No. While it hits those speeds on an NVIDIA H100, I measured only 43 tok/s on my Mac Studio. The &#8220;parallel&#8221; advantage requires datacenter-grade compute to overcome memory bandwidth bottlenecks.</p><h4>Why is the time-to-first-token (TTFT) higher for diffusion models?</h4><p>Diffusion models refine a whole block of tokens (e.g., 256) at once. Because they cannot stream results token-by-token, you must wait for the entire batch to finish before any text appears on screen.</p><h4>Can I fix the performance degradation in MLX diffusion runs?</h4><p>The slowdown is caused by graphics driver state accumulation that standard memory tools don&#8217;t report. The only reliable fix currently is to run a few generations and then restart the process entirely.</p><h4>Which Gemma model is better for coding on Apple Silicon?</h4><p>Autoregressive Gemma 4 is superior. In my benchmarks, it scored higher in quality (0.90 vs 0.84) and was significantly faster (61 tok/s vs 43 tok/s) on Mac hardware.</p><h4>Is 256GB of unified memory enough to make DiffusionGemma fast?</h4><p>Memory capacity isn&#8217;t the bottleneck; memory bandwidth is. Even with 256GB, the Mac cannot move data fast enough to feed the parallel math required for diffusion speeds.</p><div><hr></div><p><em>Running local models on Apple Silicon and want to run this yourself? The full harness is on the <strong><a href="https://github.com/Jmeg8r/diffusiongemma-benchmark">GitHub repo diffusiongemma-benchmark</a>.</strong> And if you want more of these -<strong>"I actually tried it so you don't have to"</strong> - breakdowns, subscribe &#8212; that's most of what I do here.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/diffusiongemma-vs-gemma-apple-silicon/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Built My Own Notion With Claude Fable 5 — In One Session]]></title><description><![CDATA[I gave Claude Fable 5 one prompt and by end of day I had a full Notion-style macOS app &#8212; block editor, databases, an auto-scheduling calendar, and an AI agent.]]></description><link>https://astgl.com/p/i-built-my-own-notion-with-claude-fable-5</link><guid isPermaLink="false">https://astgl.com/p/i-built-my-own-notion-with-claude-fable-5</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Thu, 11 Jun 2026 14:02:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Eyia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Eyia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Eyia!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 424w, https://substackcdn.com/image/fetch/$s_!Eyia!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 848w, https://substackcdn.com/image/fetch/$s_!Eyia!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 1272w, https://substackcdn.com/image/fetch/$s_!Eyia!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Eyia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201535096?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Eyia!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 424w, https://substackcdn.com/image/fetch/$s_!Eyia!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 848w, https://substackcdn.com/image/fetch/$s_!Eyia!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 1272w, https://substackcdn.com/image/fetch/$s_!Eyia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9247890-c6b7-4f78-9083-5277a21bb0fa_1800x1012.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I gave Claude Fable 5 one prompt and walked away for a few minutes. When I came back, it had designed a database schema. By the time I went to bed, I had a fully functional macOS desktop app with a block editor, relational databases, and a calendar that schedules itself. This is the story of how that happened and what it says about where AI-assisted development is right now.</p><div><hr></div><h2>The Prompt That Started Everything</h2><p>It started simple. I'd been using Notion for years, but it always felt like it was one subscription price increase away from becoming someone else's problem. I wanted something local. Something mine. And I'd been watching what Claude Fable 5 could do on the Max plan, so I decided to push it.</p><p>The prompt was direct:</p><blockquote><p><em>"I want you to build a macOS desktop app. This should be an app that lets you create custom pages with tables, text, images and more exactly like Notion. Use ASTGL branding. Use Convex for the database. Please build the full app, make it incredible and a professionally designed product, and make sure everything works."</em></p></blockquote><p>That was it. No architecture spec. No wireframes. No feature list beyond "exactly like Notion."</p><p>What came back wasn't just code; it was a plan. A full architecture decision: Electron + React 19 + Vite + Tailwind v4 + BlockNote for the editor + Convex running in anonymous local mode (no account, no cloud, data stays on this Mac). Fable 5 made the call that Convex's anonymous local deployment mode was the right choice before I even thought to ask about it. No auth, no monthly fee, reactive by default, data in <code>~/.convex</code>. That's a good decision.</p><h2>What Got Built</h2><p><strong>Geekspace </strong>shipped with everything I use Notion for daily.</p><p><strong>The block editor</strong> works exactly how you'd expect: type <code>/</code> for the command menu, hit <code>#</code> for headings, <code>[]</code> for to-dos. Drag handles, nested blocks, image uploads to Convex storage. The BlockNote library handles the ProseMirror plumbing; Fable 5 wired it into the Convex backend cleanly.</p><p><strong>The databases</strong> are where it gets interesting. Property types, multiple views (Table, Board, List, Calendar, Timeline), per-view filters, and sorts. Relations between databases and not just visual ones. Two-way synced relation pairs, rollup properties, and status fields with groups. The seeded template drops in a Projects &#8596; Tasks &#8596; Sprints structure pre-wired with relations and progress rollups. That took one command: <code>npm run seed</code>.</p><p><strong>The calendar that schedules itself</strong> is the headline feature. Every task with an estimate, a due date, and a priority gets automatically placed into your working hours, packed around fixed events, using earliest-deadline-first ordering. Drag a block, and it locks the engine schedules around it. Past blocks freeze as history. If something can't fit, it surfaces in a "needs attention" panel. Nothing silently drops.</p><p>And then I kept asking for more.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Features I Added</h2><p>After the first version was running, I started pushing.</p><p>"How can I connect my macOS Calendar and email?" Done. Calendar.app events mirror into Geekspace via JXA scripting, show up as fixed busy time the auto-scheduler works around, and display as dotted-edge events on the calendar. The Mail inbox widget pulls your recent messages, shows unread counts, and lets you turn any email into a task with one click.</p><p>"Notion has AI Meeting Notes&#8212;can you recreate that locally?" Done. One-click recording with a floating recorder, live level meter, pause, and resume. The audio runs through whisper.cpp for transcription, then through a local Ollama model for summarization. It never leaves the Mac. Summaries adapt to meeting type, standup vs. client vs. interview generate different formats. Action items become tasks in one click.</p><p>Then I approved a full roadmap for four more features: enterprise search over my ASTGL knowledge base, an AI agent built into the app, project templates, and a docs library. All four shipped in the same session.</p><p>The agent piece called <strong>ARCHITECT</strong> is the one I'm most proud of. I can use both Claude Agent SDK and local llm qwen3-coder:30b using a toggle. Both are wired and running directly inside the Electron main process. The qwen3-coder:30b model is great for tools usage. It connects to a custom MCP server I built called <code>geekspace-mcp</code> that exposes 14 tools covering every meaningful operation in the workspace. Ask ARCHITECT to set up a database for tracking podcast guests and it appears, live, in the sidebar. Ask it what's overdue and it queries the scheduler and tells you. Any MCP client can use this server, which means I can also drive Geekspace from Claude Code itself.</p><h2>The Bugs That Made It Real</h2><p>No build story is honest without the debugging.</p><p>The Electron window wouldn't launch. Vite was binding to IPv6 (<code>::1</code>), the startup script was polling <code>127.0.0.1</code> (IPv4), and they never found each other. One config change: <code>host: "127.0.0.1"</code> in <code>vite.config.ts</code>.</p><p>The macOS Mail widget timed out. The Automation permission dialog was blocking the JXA script. The fix required an <code>armAutomation()</code> probe function that pre-triggers the permission window before the actual fetch, with a timeout window.</p><p>The local Ollama model (gemma4) wraps its JSON output in markdown code fences even when you ask it not to. The fix: parse from the first <code>{</code> to the last <code>}</code> and ignore whatever surrounds it.</p><p>And then there was the ARCHITECT architecture mistake. My first plan routed the agent through ClaudeClaw's chat API but that API is intentionally tool-free for security reasons. The agent could chat, but couldn't actually do anything. I caught this, paused, re-planned, and rebuilt: embed the Agent SDK directly in Electron main, run everything locally. That's the version that works, and works well.</p><p>Fable 5 caught the architectural problem during the re-plan conversation and designed the corrected solution. That's not autocomplete. That's engineering judgment.</p><h2>What Fable 5 Made Possible</h2><p>Here's the thing that keeps sticking with me: I'm not a developer by trade. I'm a systems engineer who's been learning to build with AI. In a previous chapter of my career this build would have been a months-long project requiring an entire team. What got built here in a few hours included the scheduling engine, the MCP server, the reactive database layer, and the audio pipeline. Just wow!</p><p>This was one session.</p><p>Fable 5 didn't just write code from my descriptions. It made architecture decisions. It identified when I was about to go down a wrong path (the ARCHITECT routing issue). It designed a pure functional scheduler module with 21 tests. It wired an MCP server from scratch. It debugged IPv6/IPv4 mismatches and macOS permission timing issues.</p><p>The code it writes is genuinely good code. It typed, tested where it matters, and followed established patterns. I watched Fable 5 write code and spin up an agent to perform an adversarial review more than once. The debugging process felt like working with a senior engineer who happened to also be infinitely patient about explaining tradeoffs.</p><div><hr></div><h2>The Series: Building Geekspace in Public</h2><p>This article is the start of something bigger. I'm planning a full series on what I built, how it works, and what I learned. Here's where we're going:</p><p><strong>Part 1 &#8212; You're reading it.</strong> The origin story: one prompt, one session, a complete Notion-style macOS app.</p><p><strong>Part 2: The Calendar That Schedules Itself</strong></p><p>A deep dive into the auto-scheduling engine. How earliest-deadline-first + priority + chunking actually works. How locked blocks, frozen history, and "needs attention" surfacing change the way you think about task management. Why I built it as a pure module with its own test suite and why that decision saved me three times.</p><p><strong>Part 3: Five Bugs, Five Fixes &#8212; Debugging With Fable 5</strong></p><p>The IPv6/IPv4 split-brain. The Automation permission race condition. The gemma4 JSON fence problem. The orphaned Convex backend port. The wrong architecture I almost shipped. Each one is a real debugging story with a real lesson about building local AI-native apps.</p><p><strong>Part 4: AI Meeting Notes, 100% Local</strong></p><p>Whisper.cpp + Ollama + a floating recorder UI. How I rebuilt Notion's AI meeting notes without sending audio anywhere. The pipeline that took five iterations to get right. The model behavior quirks you don't find in the documentation.</p><p><strong>Part 5: One MCP Server, Three Runtimes</strong></p><p><code>geekspace-mcp</code> is a standard stdio MCP server. ARCHITECT runs it from Electron main. Claude Code can run it from the terminal. Claude Desktop can run it too. Building a workspace tool that any agent can drive and what that means for how AI and personal software intersect.</p><p><strong>Part 6: What I'd Do Differently</strong></p><p>Honest retrospective. The choices that worked out. The ones I'd reconsider. What building a complete app in one session actually costs you in terms of technical debt and understanding.</p><div><hr></div><p style="text-align: center;"><strong>Video Walkthrough of the Geekspace Build</strong></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;05db37f7-9ea8-4bd8-9d73-b01f84407521&quot;,&quot;duration&quot;:null}"></div><div><hr></div><p>The full app is called Geekspace. It runs on my Mac, stays on my Mac, and does everything I actually use Notion for. If you want to follow the build in public as I write about it, subscribe below &#8212; Part 2 drops next.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Frequently Asked Questions</h2><h3>Can Claude Fable 5 really build a full app from one prompt?
Yes &#8212; in one session, from a single prompt, it produced a working macOS app with a block editor, five database views, an auto-scheduling calendar, local AI meeting notes, and an embedded agent. It chose the architecture itself before being asked.</h3><h3>What stack did Claude choose for a Notion-style desktop app?
Electron + React 19 + Vite + Tailwind v4 + BlockNote for the editor, with Convex running in anonymous local mode &#8212; no account, no cloud, data stored in ~/.convex on the Mac.</h3><h3>Is a locally built Notion alternative actually private?
Yes. Geekspace keeps data in a local Convex deployment, and the AI meeting notes run whisper.cpp plus a local Ollama model &#8212; the audio never leaves the Mac.</h3><h3>How do you give an AI agent tools inside a desktop app?
The ARCHITECT agent runs the Claude Agent SDK in the Electron main process and connects to geekspace-mcp, a standard 14-tool MCP server. Because it's standard MCP, Claude Code and Claude Desktop can drive the same workspace.</h3><h3>What went wrong building an app this fast?
Four real bugs: an IPv6/IPv4 localhost mismatch that blocked the window, a macOS Automation permission race in the Mail widget, gemma4 wrapping JSON in markdown fences, and an early agent architecture that couldn't run tools and had to be rebuilt.</h3><h3>Do you need to be a developer to do this?
No. I'm a systems engineer learning to build with AI, not a professional developer. The skill that mattered most was knowing the outcome I wanted and recognizing when the architecture was wrong &#8212; not writing the code.</h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/i-built-my-own-notion-with-claude-fable-5?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/i-built-my-own-notion-with-claude-fable-5?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/i-built-my-own-notion-with-claude-fable-5/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/i-built-my-own-notion-with-claude-fable-5/comments"><span>Leave a comment</span></a></p><div><hr></div><p>*Part of the <strong>Building Geekspace</strong> series &#8212; an honest look at what's possible when you partner with AI to build real software. Published at <a href="https://astgl.substack.com">As The Geek Learns</a>.*</p>]]></content:encoded></item><item><title><![CDATA[Your DNS Changed and Nobody Told You. Here's the Nightly-Diff Pattern That Catches It.]]></title><description><![CDATA[A silent DNS change broke production at 2 PM on a Tuesday. Here's the baseline-and-diff bash pattern that would have caught it the night before.]]></description><link>https://astgl.com/p/dns-drift-detection-nightly-diff-bash</link><guid isPermaLink="false">https://astgl.com/p/dns-drift-detection-nightly-diff-bash</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Wed, 10 Jun 2026 11:01:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!H0lD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H0lD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H0lD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!H0lD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!H0lD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!H0lD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H0lD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38165,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284480?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H0lD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!H0lD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!H0lD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!H0lD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f8061a0-d50f-43fd-97d9-099a14db036e_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It was a Tuesday at 2:17 PM, and the marketing team's contact form was returning 502s. Not 404. Not a timeout. A clean 502, which means <em>something</em> was answering, just not the thing it was supposed to be.</p><p>An hour in, I'd checked the app server logs, restarted the nginx process twice, confirmed the SSL cert was valid, and pinged our cloud provider's status page like it owed me money. Everything looked fine everywhere I looked. Then, almost by accident, I ran `dig +short www.company.com CNAME` and saw a hostname I didn't recognize. Something like `legacy-assets.decommissioned-vendor-name.com`.</p><p>Vendor had been off the account for four months. The CNAME had quietly repointed to their infrastructure during the migration wind-down, sat there untouched, and then their old infrastructure finally went dark. Nobody changed our DNS intentionally. Nobody got notified when it happened. We found out when a sales rep tried to submit a lead form.</p><p>That was the day I stopped trusting that "nothing changed in DNS" was a statement anyone could actually verify.</p><h2>Why DNS Is the Silent-Failure Layer of Every Infrastructure</h2><p>DNS is configuration. It's just not a configuration you can store in your repo, lint on a commit, or review in a pull request. It lives in a registrar panel or a DNS provider dashboard, updated by humans who may or may not be following a change-control process, and it's completely invisible until something breaks.</p><p>Every other layer of your stack has some kind of drift detection built in these days. Config management tools track the desired state of your servers. Container orchestrators know what's supposed to be running. Infrastructure-as-code tools will tell you if something drifted from the Terraform state. DNS gets none of that by default. You get a text field in a web UI, a change that takes effect whenever the TTL expires, and exactly zero notifications.</p><p>The operational pattern most teams rely on is "we'll know when it breaks." And they're right. They will know. They'll know at 2 PM on a Tuesday when a customer reports it, after a sales lead gets lost, after the support team has spent 45 minutes ruling out everything else. The detection mechanism is user reports, which is among the worst possible monitoring strategies.</p><p>There's also a subtler problem. The change usually isn't malicious. It's not a security incident, at least not at first. It's a vendor cleanup, a platform migration, someone at a partner org tidying up their infrastructure without realizing your CNAME still pointed at them. It's the kind of change that feels harmless to whoever made it and catastrophic to whoever depends on it.</p><p>The fix isn't complicated. What you need is a declared source of truth for what your DNS <em>should</em> look like, a way to compare that against what it <em>actually</em> looks like right now, and something that runs that comparison regularly enough to catch drift before users do.</p><p>That's the pattern. The implementation fits in a bash script.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Pattern, the Four States, and a Wrapper You Can Use Today</h2><p>The idea is straightforward: declare your expected DNS state once in a YAML file, then run a script nightly that queries your authoritative nameservers and compares what it finds against what you declared. Any gap between the two gets reported.</p><p>The baseline file is the key piece. It's not generated. You write it manually, and that act of writing it is itself useful, because it forces you to actually look up what each record currently is and decide "yes, that's correct." Once it exists, it becomes your source of truth. Commit it to your repo. Update it when you make a legitimate DNS change. The baseline is always what you intend, and the script is always asking whether reality matches.</p><p>When the diff runs, every record type for every domain you declared lands in one of four states:</p><p><strong>MATCH</strong> means the live record matches the baseline exactly. This is the quiet result. Nothing to do.</p><p><strong>NEW</strong> means a record exists in live DNS that isn't in your baseline. It could be a vendor auto-adding a TXT verification record. It could be someone provisioning a new subdomain. It could be something you should care about. The script surfaces it; you decide.</p><p><strong>MISSING</strong> means your baseline declared a record that doesn't exist in live DNS anymore. An A record that was decommissioned without cleaning up. An MX record that got deleted. A CNAME that was removed when a vendor migrated their platform.</p><p><strong>DRIFT</strong> means the baseline and live DNS both have records for a type, but the values don't match. This is the Tuesday-afternoon scenario: the CNAME target changed, the IP behind an A record flipped, the SPF policy was modified.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dMrZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dMrZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!dMrZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!dMrZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!dMrZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dMrZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37007,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284480?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dMrZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!dMrZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!dMrZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!dMrZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fcbd0ab-d3f5-4e45-86e6-1654ab5dbfbc_1200x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>NEW and MISSING and DRIFT all mean something in your environment changed without you being told. The script exits nonzero when any of those occur, which makes it trivially composable with cron, alerting pipelines, or anything else that reads exit codes.</p><p>Here's a minimal working bash wrapper you can adapt right now. It keeps the dependencies to just `dig` and `bash`, uses a simple shell-array for your expected records instead of parsing YAML, and is short enough to read in under five minute<code>s:<br></code></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;f889cd53-4fb1-49d6-a09c-f44c2c9665f3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">#!/usr/bin/env bash
# dns-check.sh
# WHAT: Minimal DNS drift checker - declare expected records, diff against live DNS
# WHY:  Catches silent DNS changes before they become incidents
# Usage: ./dns-check.sh
#        Add to cron: 0 2 * * * /path/to/dns-check.sh || echo "DNS DRIFT DETECTED" | mail -s "DNS Alert" you@example.com

set -euo pipefail

# &#9472;&#9472; CONFIGURATION &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;
# Authoritative resolver to query against (use your domain's actual nameserver)
# WHY: Querying authoritative NS catches changes before they propagate to resolvers
RESOLVER="8.8.8.8"

# Declare expected records as: "domain|TYPE|expected_value"
# Get current values with: dig +short example.com A
# Run once to populate, then treat this as your source of truth
EXPECTED_RECORDS=(
  "example.com|A|93.184.216.34"
  "www.example.com|CNAME|example.com.cdn.cloudflare.net"
  "example.com|MX|10 mail.example.com"
  "example.com|TXT|v=spf1 include:_spf.google.com ~all"
)

# &#9472;&#9472; DIFF ENGINE &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;
DRIFT_FOUND=0

for record in "${EXPECTED_RECORDS[@]}"; do
  # Parse the declared record into its three parts
  domain="${record%%|*}"
  rest="${record#*|}"
  rtype="${rest%%|*}"
  expected="${rest#*|}"

  # Query live DNS at the authoritative resolver
  # WHY: +short gives us clean output; @resolver pins which nameserver answers
  actual=$(dig +short "@${RESOLVER}" "${domain}" "${rtype}" 2&gt;/dev/null \
    | sort \
    | tr '\n' '|' \
    | sed 's/\.$//g; s/|$//')

  # Normalize expected for comparison (sort, strip trailing dots)
  expected_norm=$(printf '%s\n' "${expected}" \
    | sort \
    | tr '\n' '|' \
    | sed 's/\.$//g; s/|$//')
      # Compare and classify the result
  if [[ -z "${actual}" ]]; then
    # Record existed in baseline but dig returned nothing: MISSING
    printf "MISSING  %s %s  (expected: %s)\n" "${domain}" "${rtype}" "${expected}"
    DRIFT_FOUND=1
  elif [[ "${actual}" != "${expected_norm}" ]]; then
    # Record exists but value changed: DRIFT
    printf "DRIFT    %s %s\n  expected: %s\n  actual:   %s\n" \
      "${domain}" "${rtype}" "${expected}" "${actual}"
    DRIFT_FOUND=1
  else
    # Values match: MATCH (silent - no output unless you add --verbose logic)
    : # nothing to report
  fi
done

# Exit nonzero on any drift - composable with cron, alerting, CI checks
if [[ "${DRIFT_FOUND}" -eq 1 ]]; then
  printf "\nDrift detected. Review records above.\n" &gt;&amp;2
  exit 1
fi

printf "All %d declared records match live DNS.\n" "${#EXPECTED_RECORDS[@]}"
exit 0</code></pre></div><p><br>Save that, drop your actual records into `EXPECTED_RECORDS`, and run it once to confirm it sees what you expect. Then add it to your crontab:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;c7051231-7f7f-49bb-b864-bc4872edaf50&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash"># Run nightly at 2 AM, email on drift
0 2 * * * /path/to/dns-check.sh || echo "DNS drift detected on $(hostname)" | mail -s "[ALERT] DNS Drift" you@example.com</code></pre></div><p>The "NEW record exists in live DNS" state isn't in this minimal version, since detecting it requires knowing which record types to scan for beyond what you declared. The four-state model handles that fully once you know which types to watch, which is what the complete kit covers. For a first pass, catching MISSING and DRIFT gets you most of the value.</p><p>A few practical notes. Use `dig +short` rather than `dig` without `+short` or you'll spend time parsing the human-readable output format. Always query a specific nameserver with `@resolver` rather than relying on your local resolver, since caching can hide drift for hours. The MX record normalization is worth being careful about: `dig +short` returns the priority prefix as part of the value (`10 mail.example.com`), so your expected strings need to include it exactly that way. And commit the script alongside your baseline declaration. If the baseline lives in the repo, you get history, diffs, and code review for DNS changes as a side effect.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><h2>What Else Lives in the Full Kit</h2><p>The script above covers the core pattern. The full DNS Drift Detector kit is what you reach for once you've outgrown the wrapper.</p><p>The main `dns-drift-detector.sh` handles all five record types: A, AAAA, CNAME, MX, and TXT. That last group matters more than it seems. TXT records are where SPF policies live, where DKIM selectors sit, where domain verification tokens accumulate. Quiet SPF drift can break your email deliverability for days before anyone notices. DKIM drift means legitimate mail starts hitting spam folders. These aren't hypothetical edge cases.</p><p>The color-coded output makes the diff results readable at a glance during incident response, and the `--quiet` flag strips all of that for cron-friendly logging where you only want the exit code to speak. There's a `--no-color` flag too, so piping to a log file doesn't fill it with ANSI escape sequences.</p><p>`install-cron.sh` is a one-command idempotent installer. It checks that `dig` is available, puts the scripts where they belong, creates a log directory, and writes the cron entry with a duplicate-guard marker so running it twice doesn't add the job twice. That kind of thing is boring to write and annoying to get wrong.</p><p>The `baseline.yaml` in the kit is annotated with examples for ten common services: Google Workspace MX, Cloudflare CDN CNAMEs, SendGrid SPF, common DKIM selectors, and a few others. It's the reference you use when you're populating your own baseline for the first time.</p><p>The runbook covers install, baseline setup, how to read the output, what to do for each of the four states, how to update the baseline after a legitimate change, and an FAQ. That last one matters during an incident, when you don't want to be making judgment calls about whether a NEW record means "update the baseline" or "call the registrar."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MZjl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MZjl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!MZjl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!MZjl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!MZjl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MZjl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58684,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284480?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MZjl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!MZjl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!MZjl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!MZjl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71d18c0-220f-42d5-90cf-d2898429141a_1200x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Try the Pattern Today</h2><p>The pattern described here is genuinely useful as-is. Declare your records, schedule the diff, react to the exits. That alone puts you ahead of the "we'll know when it breaks" approach that most infrastructure environments are actually running.</p><p>If you want the full kit, it's at <a href="https://shop.asthegeeklearns.com/products/dns-drift-detector">shop.asthegeeklearns.com/products/dns-drift-detector</a> for $19. You get the complete `dns-drift-detector.sh` with all five record types, color output, quiet mode, and cron logging; the idempotent `install-cron.sh`; the annotated `baseline.yaml` with ten real-world service examples; and the full operator runbook.</p><p>The Tuesday-afternoon incident I described at the top cost more than $19 worth of everyone's time. The detection script would have caught it the night before.</p><p><em>As The Geek Learns is a newsletter about systems engineering, automation, and the gap between knowing something and actually applying it. If this was useful, subscribe for free to get new articles as they land.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/dns-drift-detection-nightly-diff-bash?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/dns-drift-detection-nightly-diff-bash?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/dns-drift-detection-nightly-diff-bash/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/dns-drift-detection-nightly-diff-bash/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Secured My AI Agent With a 7-Layer Threat Model]]></title><description><![CDATA[Using the MAESTRO framework to harden an autonomous agent -- seven layers of things that can go wrong, translated from security-paper-speak in your day.]]></description><link>https://astgl.com/p/secured-ai-agent-7-layer-threat-model-podcast-episode-014</link><guid isPermaLink="false">https://astgl.com/p/secured-ai-agent-7-layer-threat-model-podcast-episode-014</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 08 Jun 2026 17:00:31 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/201167231/6bbc433131cc6958f3a3e98c9d79399a.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong>I have an autonomous AI agent running on my Mac Studio. It has full shell access, reads my calendar, manages my tasks, and sends iMessages on my behalf. It runs 24/7 as a background service.</strong></p><p>If that sentence doesn&#8217;t make you slightly nervous, you haven&#8217;t been paying attention. In <a href="https://www.isec.news/2026/02/10/securityscorecard-135000-plus-internet-exposed-openclaw-instances-found/">February 2026, researchers found over 135,000 OpenClaw instances exposed to the public internet</a>. A coordinated attack called <a href="https://cybersecuritynews.com/clawhavoc-poisoned-openclaws-clawhub/">ClawHavoc</a> planted over a thousand malicious plugins in the community registry. Nine CVEs have been disclosed, including remote code execution.</p><p>I needed to take security seriously. Not &#8220;I changed the default password&#8221; seriously. Threat-model seriously.</p><h2>MAESTRO: Seven Layers of Things That Can Go Wrong</h2><p>The <a href="https://cloudsecurityalliance.org/">Cloud Security Alliance </a>published a framework called <a href="https://github.com/CloudSecurityAlliance/MAESTRO">MAESTRO</a>&#8212;a 7-layer threat model specifically designed for agentic AI systems. Ken Huang mapped it directly to OpenClaw&#8217;s codebase, identifying 35+ specific threats across every layer of the stack.</p><p>Here are the seven layers, translated from security-paper language into &#8220;things that could actually ruin your day&#8221;:</p><p><strong>Layer 1: Foundation Models:</strong> Someone sends your agent a crafted message that hijacks its behavior. Prompt injection. Jailbreaks. System prompt leakage. Your agent does what an attacker tells it to instead of what you told it to.</p><p><strong>Layer 2: Data Operations:</strong> Your credentials are stored in plaintext JSON files. Your session logs contain every conversation forever. A malicious skill injects code through your workspace.</p><p><strong>Layer 3: Agent Frameworks:</strong> The agent misuses its own tools. It runs shell commands it shouldn&#8217;t. It spawns sessions without authorization. It escalates its own privileges.</p><p><strong>Layer 4: Deployment &amp; Infrastructure:</strong> Your gateway is exposed to the network. Someone brute-forces the WebSocket token. A reverse proxy misconfiguration bypasses authentication entirely.</p><p><strong>Layer 5: Evaluation &amp; Observability:</strong> Nobody&#8217;s watching the agent for anomalous behavior. There&#8217;s no audit trail. Logs can be tampered with. If the agent starts acting weird, nothing catches it.</p><p><strong>Layer 6: Security &amp; Compliance:</strong> Your DM policy is misconfigured. Anyone can message the agent. Pairing codes can be brute-forced. Identity can be spoofed across channels.</p><p><strong>Layer 7: Agent Ecosystem:</strong> A malicious plugin gets installed. A legitimate plugin&#8217;s npm dependency gets compromised. The skill registry serves poisoned packages.</p><p>The critical attack chain MAESTRO identifies: compromise the gateway (Layer 4) &#8594; access the session store (Layer 2) &#8594; poison conversation history (Layer 1) &#8594; control the agent (Layer 3) &#8594; spread via messaging (Layer 7).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JxSB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" width="1184" height="93" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:93,&quot;width&quot;:1184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20177,&quot;alt&quot;:&quot;Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)&quot;,&quot;title&quot;:&quot;Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" title="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Reading this was humbling. I&#8217;d addressed some of these by instinct during setup. Loopback binding, directory permissions, and pairing-based access control were all implemented. But &#8220;some&#8221; isn&#8217;t a security posture.</p><h2>SecureClaw: The Audit</h2><p><a href="https://github.com/adversa-ai/secureclaw">SecureClaw</a> is an open-source security tool built specifically for OpenClaw by Adversa AI. It maps to MAESTRO, OWASP, MITRE ATLAS, and NIST AI 100-2. The install is a git clone and a bash script, no npm install, no network calls, and no surprises.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;31b4db6d-5d95-413a-9748-1edf870fb6f3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">git clone https://github.com/adversa-ai/secureclaw.git
bash secureclaw/secureclaw/skill/scripts/install.sh</code></pre></div><p>Then you run the audit:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d3bd4bd2-f1f5-4139-a873-12e14991b95d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">bash ~/.openclaw/skills/secureclaw/scripts/quick-audit.sh</code></pre></div><p>My baseline score: <strong>57 out of 100.</strong> Zero criticals. Three HIGHs. Three MEDIUMs. Eight checks passing.</p><p>Here&#8217;s what passed without any work:</p><p>&#8226; Gateway bound to loopback (127.0.0.1) not exposed to network</p><p>&#8226; Gateway authentication present</p><p>&#8226; Directory permissions set to 700 (owner only)</p><p>&#8226; No browser relay exposed</p><p>&#8226; DM policy set to pairing (not open)</p><p>&#8226; Skills clean of malicious patterns</p><p>And here&#8217;s what failed:</p><blockquote><p>&#128992; HIGH Plaintext key exposure: Keys in openclaw.json and 5 backup files</p><p>&#128992; HIGH Sandbox mode: commands run directly on host</p><p>&#128992; HIGH Exec approval mode: agent acts without human approval</p><p>&#128993; MED No cognitive file baselines: can&#8217;t detect tampering</p><p>&#128993; MED Default control tokens: vulnerable to spoofing</p><p>&#128993; MED No failure mode: no graceful degradation</p></blockquote><h2>The Hardening</h2><p><strong>Step 1: Clean up credential leaks.</strong> OpenClaw creates .bak files every time you change config. Each backup contains your full config, including Slack tokens and API keys. I had five of them sitting in the OpenClaw directory. Deleted them all. Set the main config to 600 permissions.</p><p>This is the kind of thing that&#8217;s easy to miss and catastrophic to ignore. A single ls -la ~/.openclaw/ would show them. But who runs ls -la on their config directory after every change?</p><p><strong>Step 2: Create integrity baselines.</strong> SecureClaw&#8217;s hardener generates SHA256 hashes of your &#8220;cognitive files&#8221; IDENTITY.md, AGENTS.md, and HEARTBEAT.md. These are the files that define who your agent <em>is</em> and what it <em>does</em>. If an attacker or a hallucinating agent modifies them, the nightly integrity check will catch it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;90df69e7-009c-4398-90a0-846a125ebc72&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">bash ~/.openclaw/skills/secureclaw/scripts/quick-harden.sh</code></pre></div><p><strong>Step 3: Exec approvals.</strong> This is the big one. MAESTRO recommends human-in-the-loop approval for all shell commands. But my agent runs morning briefings and heartbeat checks on cron&#8212;unattended. Setting approvals to &#8220;always&#8221; would break all automation.</p><p>The solution: an <strong>allowlist with on-miss approval.</strong> I created ~/.openclaw/exec-approvals.json with 17 safe command patterns: imsg, calctl, apple-reminders, cairn, and basic file operations. Tars can run these freely. Anything else; curl, rm, pip install, or any command not on the list, requires human approval.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;0fd30378-dc9d-4356-9c40-b93415434cda&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  &#8220;defaults&#8221;: {
    &#8220;security&#8221;: &#8220;allowlist&#8221;,
    &#8220;ask&#8221;: &#8220;on-miss&#8221;
  },
  &#8220;agents&#8221;: {
    &#8220;main&#8221;: {
      &#8220;allowlist&#8221;: [
        { &#8220;pattern&#8221;: &#8220;imsg *&#8221;, &#8220;note&#8221;: &#8220;iMessage send/read&#8221; },
        { &#8220;pattern&#8221;: &#8220;calctl *&#8221;, &#8220;note&#8221;: &#8220;Apple Calendar&#8221; },
        { &#8220;pattern&#8221;: &#8220;cairn *&#8221;, &#8220;note&#8221;: &#8220;Task management&#8221; }
      ]
    }
  }
}</code></pre></div><p>This is the trade-off MAESTRO doesn&#8217;t talk about: <strong>security versus automation.</strong> Maximum security means every action needs approval. Maximum automation means the agent acts freely. The allowlist is the middle ground. Routine operations are pre-approved, and novel or dangerous operations require a human.</p><p><strong>Step 4: Full plugin install.</strong> Beyond the bash scripts, SecureClaw has a full npm plugin with 56 runtime audit checks, background monitors for config drift, and real-time integrity verification. Installing it required building from source (TypeScript &#8594; JavaScript) and registering it with OpenClaw&#8217;s plugin system.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;48d37b56-430a-4c07-b786-d9162bba10f5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">openclaw plugins install -l /path/to/secureclaw

openclaw config set plugins.allow &#8216;[&#8221;secureclaw&#8221;]&#8217;</code></pre></div><p>That plugins.allow line is important. By default, OpenClaw will auto-load any discovered plugin. Explicit trust means only plugins you&#8217;ve approved get loaded.</p><p><strong>Step 5: Nightly audit cron.</strong> A macOS LaunchAgent runs the full audit suite every night at 2 AM which includes quick-audit, integrity check, and supply chain scan. Results go to secureclaw-audit.log. If something changes overnight, it shows up in the morning.</p><h2>The Final Score</h2><p>After hardening: <strong>64 out of 100.</strong> Nine checks passing. Zero criticals. The three remaining HIGHs are documented, accepted trade-offs:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c-Ja!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" width="1456" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94290,&quot;alt&quot;:&quot;Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to \&quot;always\&quot; &#8212; I use an allowlist plus on-miss approval instead, because full \&quot;always\&quot; would break unattended cron automation.&quot;,&quot;title&quot;:&quot;Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to \&quot;always\&quot; &#8212; I use an allowlist plus on-miss approval instead, because full \&quot;always\&quot; would break unattended cron automation.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." title="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Findings I accepted (with reasoning)&#8212;Sandbox mode (Docker sandboxing would break imsg, calctl, and Apple Reminders); Plaintext keys in config (inherent to the platform config format, file is locked to 600); Exec approval not &#8220;always&#8221; (using allowlist + on-miss; full &#8220;always&#8221; breaks unattended cron automation).</em></p><p>The two MEDIUMs, control token customization and failure mode configuration, aren&#8217;t supported in OpenClaw v2026.3.2&#8217;s config schema yet. SecureClaw checks for them proactively. They&#8217;ll be fixable when OpenClaw adds the config options.</p><h2>What I Actually Learned</h2><p><strong>Security isn&#8217;t a feature you enable.</strong> It&#8217;s a series of trade-offs you make with your eyes open. Sandbox mode is &#8220;more secure&#8221; but breaks the tools that make the agent useful. Approval mode &#8220;always&#8221; is &#8220;more secure&#8221; but kills the automation that makes the agent worthwhile. The right security posture isn&#8217;t maximum restriction; it&#8217;s documented, intentional decisions about what risks you accept and why.</p><p><strong>Automated scanning is essential but insufficient.</strong> SecureClaw&#8217;s audit caught things I would have missed, including the .bak files with credentials, the missing integrity baselines, and the open exec policy. But the HIGHs it flagged as failures are things I&#8217;ve consciously accepted. No scanner can evaluate your specific trade-offs.</p><p><strong>The biggest threat isn&#8217;t external.</strong> In my setup (loopback-bound, pairing-gated, allowlist-filtered), the most likely security failure isn&#8217;t a network attacker. It&#8217;s a malicious skill, a compromised npm package, or the agent itself hallucinating destructive actions. Layer 7 (ecosystem) and Layer 1 (model behavior) are the real attack surfaces for a local-first setup. The exec approval allowlist is my primary defense for both.</p><p><strong>Clean up after yourself.</strong> OpenClaw creates backup files containing credentials on every config change. There&#8217;s no auto-cleanup. If you&#8217;re running OpenClaw, go check your directory right now: ls ~/.openclaw/*.bak*. You might be surprised.</p><h2>Quick Reference</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Qje!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" width="1456" height="1302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1302,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159198,&quot;alt&quot;:&quot;Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/.&quot;,&quot;title&quot;:&quot;Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." title="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Hardening actions and commands: install, run audit, apply hardening, check integrity, scan skills, check for credential leaks, set exec approvals, set plugin trust. Commands target ~/.openclaw/skills/secureclaw/scripts/. Full command details in the image.</em></p><h2>Update&#8212;June 2026: What I Actually Did When I Moved to ClaudeClaw</h2><p>I wrote this piece in March, when OpenClaw was still the thing running my Mac Studio. By the end of April, I&#8217;d shut it down. Disabled the cron jobs, quarantined the LaunchAgents, and rebuilt the whole stack on the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a>. Based off of <strong><a href="https://github.com/earlyaidopters/claudeclaw">ClaudeClaw</a> </strong>from the <a href="https://www.skool.com/earlyaidopters/about">Early AI-Dopters</a> AI learning group. The full post-mortem on <em>why</em>:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2dc54966-9faa-4b6c-9c7e-3b1f18782638&quot;,&quot;caption&quot;:&quot;Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Killed OpenClaw and Built ClaudeClaw Mission Control&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T5FD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6a6400-f0cd-4ff3-8541-f6cccf4d9a87_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-02T23:01:21.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196179846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Why? </strong>The short version is this: I couldn&#8217;t <em>see</em> into OpenClaw. Which, if you scroll back up, is Layer 5: Evaluation &amp; Observability, the exact layer this audit was weakest on.</p><p>You may wonder whether I just copied the 7-layer hardening over to the new stack. I didn&#8217;t, and I want to be honest about that. <strong>I did not port MAESTRO one-for-one.</strong> SecureClaw was written specifically for OpenClaw. Some of its thinking transferred; some of it didn&#8217;t. And the threat model itself moved on (more on that at the end). What the seven layers became was a checklist: for each one, <em>how does the new architecture answer this?</em> Here&#8217;s the scorecard.</p><p><strong>The two layers that changed the most.</strong></p><p><em><strong>Layer 5 (Observability)</strong></em> went from my single biggest weakness to the entire reason ClaudeClaw exists. There&#8217;s now a dedicated agent, <strong>WATCHMAN</strong>, running seven probes every hour: failed tasks, stuck tasks, missed scheduler slots, daemon liveness, content-pipeline health, hidden failures (it greps the success logs for crash text), and delegation crashes. More importantly, there&#8217;s a <em>second</em> healthcheck running as a separate LaunchAgent with its own keychain-backed alert token. If the main daemon dies, the thing that tells me about it is still alive. The rule I wrote for myself out of this: <strong>the watcher cannot share fate with the watched</strong>. There&#8217;s also a behavioral dashboard, DefenseClaw, sitting on 127.0.0.1:3141.</p><p><em><strong>Layer 3 (Agent Frameworks)</strong></em> is where my OpenClaw work actually carried forward. The exec-approvals allowlist from Step 3 above is the direct ancestor of what ClaudeClaw does now, except the enforcement dropped down a level. The first thing I shipped was killing bypassPermissions (the main agent had been running with permission checks disabled, which means a compromised agent has unlimited tool access. The SDK was no ceiling at all), switching to the SDK&#8217;s default permission mode, and handing the main agent a 15-tool allowlist as the single source of truth. Same idea as the OpenClaw allowlist. Enforced by the SDK itself instead of a config file I had to maintain.</p><p>The rest mapped like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQoC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" width="1456" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:339803,&quot;alt&quot;:&quot;Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries.&quot;,&quot;title&quot;:&quot;Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." title="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>How each of the seven MAESTRO layers from the OpenClaw audit is answered in ClaudeClaw. </em></p><p><em><strong>Layer 1 Foundation Models: </strong>channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved).</em></p><p><em><strong>Layer 2 Data Operations: </strong>Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced and extended).</em></p><p><em><strong>Layer 3 Agent Frameworks:</strong> SDK permission ceiling and a 15-tool allowlist, the direct successor to the OpenClaw exec-approvals list (kept, moved into the SDK).</em></p><p><em><strong>Layer 4 Deployment and Infrastructure: </strong>an egress gateway plus kernel-level pf default-deny (replaced). </em></p><p><em><strong>Layer 5 Evaluation and Observability: </strong>WATCHMAN&#8217;s seven probes and a fate-isolated external healthcheck, the biggest upgrade.</em></p><p><em><strong>Layer 6 Security and Compliance:</strong> out-of-band Telegram confirmation for state-changing actions and a role policy kept separate from content memory (evolved).</em></p><p><em><strong>Layer 7 Agent Ecosystem: </strong>an MCP allowlist plus the tool ceiling as a second layer (kept and hardened). </em></p><p><em>Plus a new row beyond MAESTRO.  <strong>Memory persistence: </strong>TTLs, a hash-chained write log, and canaries.</em></p><p><strong>Where the 7-layer model ran out.</strong></p><p>MAESTRO is a <em>static</em> threat model. It&#8217;s a map of what can go wrong at each layer, frozen in time. What it doesn&#8217;t have a layer for is <strong>persistence</strong>. An attack that lands quietly in your agent&#8217;s memory or vector store and just waits. My scheduler re-enters context every 60 seconds, which means anything dormant in memory fires on a clock. That&#8217;s a different class of problem, and it has a name now: <a href="https://www.semanticscholar.org/paper/Logic-layer-Prompt-Control-Injection-(LPCI)%3A-A-in-Atta-Huang/7209db0a616b54335db85d6e73a0dc9505192e59?utm_source=direct_link">LPCI, Logic-layer Prompt-based Conditional Injection</a>. Hardening against it (I am planning a separate two-part write-up on <a href="https://astgl.substack.com">As The Geek Learns</a>) meant building things MAESTRO never asked for, including a canonicalizer that decodes payloads <em>before</em> they reach the vector store, channel-tagged prompts so the model knows retrieved text is data and not instructions, memory TTLs, a hash-chained write log, and canary entries that page me if memory ever leaks into output.</p><p><strong>What I gave up and what I kept.</strong> The honest cost of the move: I lost local-first. OpenClaw ran on Ollama, fully offline; ClaudeClaw talks to Anthropic&#8217;s API. I still own every byte of my data; it&#8217;s all on my SSD; I just don&#8217;t own the weights anymore. What carried over intact was the philosophy this whole series is built on: every document is a file I can grep, every config is version-controlled, and every decision has a session note. That part never changed.</p><p><em>This is Part 5 of the Notion Replacement series. We went from &#8220;install an AI agent&#8221; to &#8220;secure it against a 7-layer threat model&#8221; in two days. Follow along at <a href="https://astgl.substack.com">As The Geek Learns</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[I Secured My AI Agent With a 7-Layer Threat Model]]></title><description><![CDATA[Using the MAESTRO framework to harden an autonomous agent&#8212;seven layers of things that can go wrong, translated from security-paper-speak into your day.]]></description><link>https://astgl.com/p/secured-ai-agent-7-layer-threat-model</link><guid isPermaLink="false">https://astgl.com/p/secured-ai-agent-7-layer-threat-model</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 08 Jun 2026 16:31:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IkGX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IkGX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IkGX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 424w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 848w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1272w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IkGX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png" width="1456" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99253,&quot;alt&quot;:&quot;Title card for \&quot;I Secured My AI Agent With a 7-Layer Threat Model.\&quot; A dark navy banner: on the left, a teal security shield holding a padlock with an audit score rising from 57 to 64 out of 100; on the right, the seven MAESTRO threat layers stacked as color-coded bars, from Layer 7 (Agent Ecosystem) at the top down to Layer 1 (Foundation Models).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Title card for &quot;I Secured My AI Agent With a 7-Layer Threat Model.&quot; A dark navy banner: on the left, a teal security shield holding a padlock with an audit score rising from 57 to 64 out of 100; on the right, the seven MAESTRO threat layers stacked as color-coded bars, from Layer 7 (Agent Ecosystem) at the top down to Layer 1 (Foundation Models)." title="Title card for &quot;I Secured My AI Agent With a 7-Layer Threat Model.&quot; A dark navy banner: on the left, a teal security shield holding a padlock with an audit score rising from 57 to 64 out of 100; on the right, the seven MAESTRO threat layers stacked as color-coded bars, from Layer 7 (Agent Ecosystem) at the top down to Layer 1 (Foundation Models)." srcset="https://substackcdn.com/image/fetch/$s_!IkGX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 424w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 848w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1272w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>I have an autonomous AI agent running on my Mac Studio. It has full shell access, reads my calendar, manages my tasks, and sends iMessages on my behalf. It runs 24/7 as a background service.</strong></p><p>If that sentence doesn&#8217;t make you slightly nervous, you haven&#8217;t been paying attention. In <a href="https://www.isec.news/2026/02/10/securityscorecard-135000-plus-internet-exposed-openclaw-instances-found/">February 2026, researchers found over 135,000 OpenClaw instances exposed to the public internet</a>. A coordinated attack called <a href="https://cybersecuritynews.com/clawhavoc-poisoned-openclaws-clawhub/">ClawHavoc</a> planted over a thousand malicious plugins in the community registry. Nine CVEs have been disclosed, including remote code execution.</p><p>I needed to take security seriously. Not &#8220;I changed the default password&#8221; seriously. Threat-model seriously.</p><h2>MAESTRO: Seven Layers of Things That Can Go Wrong</h2><p>The <a href="https://cloudsecurityalliance.org/">Cloud Security Alliance </a>published a framework called <a href="https://github.com/CloudSecurityAlliance/MAESTRO">MAESTRO</a>&#8212;a 7-layer threat model specifically designed for agentic AI systems. Ken Huang mapped it directly to OpenClaw&#8217;s codebase, identifying 35+ specific threats across every layer of the stack.</p><p>Here are the seven layers, translated from security-paper language into &#8220;things that could actually ruin your day&#8221;:</p><p><strong>Layer 1: Foundation Models:</strong> Someone sends your agent a crafted message that hijacks its behavior. Prompt injection. Jailbreaks. System prompt leakage. Your agent does what an attacker tells it to instead of what you told it to.</p><p><strong>Layer 2: Data Operations:</strong> Your credentials are stored in plaintext JSON files. Your session logs contain every conversation forever. A malicious skill injects code through your workspace.</p><p><strong>Layer 3: Agent Frameworks:</strong> The agent misuses its own tools. It runs shell commands it shouldn&#8217;t. It spawns sessions without authorization. It escalates its own privileges.</p><p><strong>Layer 4: Deployment &amp; Infrastructure:</strong> Your gateway is exposed to the network. Someone brute-forces the WebSocket token. A reverse proxy misconfiguration bypasses authentication entirely.</p><p><strong>Layer 5: Evaluation &amp; Observability:</strong> Nobody&#8217;s watching the agent for anomalous behavior. There&#8217;s no audit trail. Logs can be tampered with. If the agent starts acting weird, nothing catches it.</p><p><strong>Layer 6: Security &amp; Compliance:</strong> Your DM policy is misconfigured. Anyone can message the agent. Pairing codes can be brute-forced. Identity can be spoofed across channels.</p><p><strong>Layer 7: Agent Ecosystem:</strong> A malicious plugin gets installed. A legitimate plugin&#8217;s npm dependency gets compromised. The skill registry serves poisoned packages.</p><p>The critical attack chain MAESTRO identifies: compromise the gateway (Layer 4) &#8594; access the session store (Layer 2) &#8594; poison conversation history (Layer 1) &#8594; control the agent (Layer 3) &#8594; spread via messaging (Layer 7).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JxSB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" width="1184" height="93" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:93,&quot;width&quot;:1184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20177,&quot;alt&quot;:&quot;Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" title="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Reading this was humbling. I&#8217;d addressed some of these by instinct during setup. Loopback binding, directory permissions, and pairing-based access control were all implemented. But &#8220;some&#8221; isn&#8217;t a security posture.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>SecureClaw: The Audit</h2><p><a href="https://github.com/adversa-ai/secureclaw">SecureClaw</a> is an open-source security tool built specifically for OpenClaw by Adversa AI. It maps to MAESTRO, OWASP, MITRE ATLAS, and NIST AI 100-2. The install is a git clone and a bash script, no npm install, no network calls, and no surprises.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4d8e2ceb-1192-4b9e-8326-752bb92548ea&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">git clone https://github.com/adversa-ai/secureclaw.git
bash secureclaw/secureclaw/skill/scripts/install.sh</code></pre></div><p>Then you run the audit:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5c44792e-87fe-4898-99ca-c79570cea425&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">bash ~/.openclaw/skills/secureclaw/scripts/quick-audit.sh</code></pre></div><p>My baseline score: <strong>57 out of 100.</strong> Zero criticals. Three HIGHs. Three MEDIUMs. Eight checks passing.</p><p>Here&#8217;s what passed without any work:</p><p>&#8226; Gateway bound to loopback (127.0.0.1) not exposed to network</p><p>&#8226; Gateway authentication present</p><p>&#8226; Directory permissions set to 700 (owner only)</p><p>&#8226; No browser relay exposed</p><p>&#8226; DM policy set to pairing (not open)</p><p>&#8226; Skills clean of malicious patterns</p><p>And here&#8217;s what failed:</p><blockquote><p>&#128992; HIGH Plaintext key exposure: Keys in openclaw.json and 5 backup files</p><p>&#128992; HIGH Sandbox mode: commands run directly on host</p><p>&#128992; HIGH Exec approval mode: agent acts without human approval</p><p>&#128993; MED No cognitive file baselines: can&#8217;t detect tampering</p><p>&#128993; MED Default control tokens: vulnerable to spoofing</p><p>&#128993; MED No failure mode: no graceful degradation</p></blockquote><h2>The Hardening</h2><p><strong>Step 1: Clean up credential leaks.</strong> OpenClaw creates .bak files every time you change config. Each backup contains your full config, including Slack tokens and API keys. I had five of them sitting in the OpenClaw directory. Deleted them all. Set the main config to 600 permissions.</p><p>This is the kind of thing that&#8217;s easy to miss and catastrophic to ignore. A single ls -la ~/.openclaw/ would show them. But who runs ls -la on their config directory after every change?</p><p><strong>Step 2: Create integrity baselines.</strong> SecureClaw&#8217;s hardener generates SHA256 hashes of your &#8220;cognitive files&#8221; IDENTITY.md, AGENTS.md, and HEARTBEAT.md. These are the files that define who your agent <em>is</em> and what it <em>does</em>. If an attacker or a hallucinating agent modifies them, the nightly integrity check will catch it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;847539b7-2bbe-458a-9fb0-4549a0891a45&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">bash ~/.openclaw/skills/secureclaw/scripts/quick-harden.sh</code></pre></div><p><strong>Step 3: Exec approvals.</strong> This is the big one. MAESTRO recommends human-in-the-loop approval for all shell commands. But my agent runs morning briefings and heartbeat checks on cron&#8212;unattended. Setting approvals to &#8220;always&#8221; would break all automation.</p><p>The solution: an <strong>allowlist with on-miss approval.</strong> I created ~/.openclaw/exec-approvals.json with 17 safe command patterns: imsg, calctl, apple-reminders, cairn, and basic file operations. Tars can run these freely. Anything else; curl, rm, pip install, or any command not on the list, requires human approval.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;1bf97214-1271-4610-9e32-6f2d5cf85833&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  "defaults": {
    "security": "allowlist",
    "ask": "on-miss"
  },
  "agents": {
    "main": {
      "allowlist": [
        { "pattern": "imsg *", "note": "iMessage send/read" },
        { "pattern": "calctl *", "note": "Apple Calendar" },
        { "pattern": "cairn *", "note": "Task management" }
      ]
    }
  }
}</code></pre></div><p>This is the trade-off MAESTRO doesn&#8217;t talk about: <strong>security versus automation.</strong> Maximum security means every action needs approval. Maximum automation means the agent acts freely. The allowlist is the middle ground. Routine operations are pre-approved, and novel or dangerous operations require a human.</p><p><strong>Step 4: Full plugin install.</strong> Beyond the bash scripts, SecureClaw has a full npm plugin with 56 runtime audit checks, background monitors for config drift, and real-time integrity verification. Installing it required building from source (TypeScript &#8594; JavaScript) and registering it with OpenClaw&#8217;s plugin system.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;1dc00374-b934-4485-a78e-91ca04004717&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">openclaw plugins install -l /path/to/secureclaw

openclaw config set plugins.allow &#8216;[&#8221;secureclaw&#8221;]&#8217;</code></pre></div><p>That plugins.allow line is important. By default, OpenClaw will auto-load any discovered plugin. Explicit trust means only plugins you&#8217;ve approved get loaded.</p><p><strong>Step 5: Nightly audit cron.</strong> A macOS LaunchAgent runs the full audit suite every night at 2 AM which includes quick-audit, integrity check, and supply chain scan. Results go to secureclaw-audit.log. If something changes overnight, it shows up in the morning.</p><h2>The Final Score</h2><p>After hardening: <strong>64 out of 100.</strong> Nine checks passing. Zero criticals. The three remaining HIGHs are documented, accepted trade-offs:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c-Ja!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" width="1456" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94290,&quot;alt&quot;:&quot;Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to \&quot;always\&quot; &#8212; I use an allowlist plus on-miss approval instead, because full \&quot;always\&quot; would break unattended cron automation.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." title="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Findings I accepted (with reasoning)&#8212;Sandbox mode (Docker sandboxing would break imsg, calctl, and Apple Reminders); Plaintext keys in config (inherent to the platform config format, file is locked to 600); Exec approval not &#8220;always&#8221; (using allowlist + on-miss; full &#8220;always&#8221; breaks unattended cron automation).</em></p><p>The two MEDIUMs, control token customization and failure mode configuration, aren&#8217;t supported in OpenClaw v2026.3.2&#8217;s config schema yet. SecureClaw checks for them proactively. They&#8217;ll be fixable when OpenClaw adds the config options.</p><h2>What I Actually Learned</h2><p><strong>Security isn&#8217;t a feature you enable.</strong> It&#8217;s a series of trade-offs you make with your eyes open. Sandbox mode is &#8220;more secure&#8221; but breaks the tools that make the agent useful. Approval mode &#8220;always&#8221; is &#8220;more secure&#8221; but kills the automation that makes the agent worthwhile. The right security posture isn&#8217;t maximum restriction; it&#8217;s documented, intentional decisions about what risks you accept and why.</p><p><strong>Automated scanning is essential but insufficient.</strong> SecureClaw&#8217;s audit caught things I would have missed, including the .bak files with credentials, the missing integrity baselines, and the open exec policy. But the HIGHs it flagged as failures are things I&#8217;ve consciously accepted. No scanner can evaluate your specific trade-offs.</p><p><strong>The biggest threat isn&#8217;t external.</strong> In my setup (loopback-bound, pairing-gated, allowlist-filtered), the most likely security failure isn&#8217;t a network attacker. It&#8217;s a malicious skill, a compromised npm package, or the agent itself hallucinating destructive actions. Layer 7 (ecosystem) and Layer 1 (model behavior) are the real attack surfaces for a local-first setup. The exec approval allowlist is my primary defense for both.</p><p><strong>Clean up after yourself.</strong> OpenClaw creates backup files containing credentials on every config change. There&#8217;s no auto-cleanup. If you&#8217;re running OpenClaw, go check your directory right now: ls ~/.openclaw/*.bak*. You might be surprised.</p><h2>Quick Reference</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Qje!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" width="1456" height="1302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1302,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159198,&quot;alt&quot;:&quot;Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." title="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Hardening actions and commands: install, run audit, apply hardening, check integrity, scan skills, check for credential leaks, set exec approvals, set plugin trust. Commands target ~/.openclaw/skills/secureclaw/scripts/. Full command details in the image.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/secured-ai-agent-7-layer-threat-model?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/secured-ai-agent-7-layer-threat-model?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Update&#8212;June 2026: What I Actually Did When I Moved to ClaudeClaw</h2><p>I wrote this piece in March, when OpenClaw was still the thing running my Mac Studio. By the end of April, I&#8217;d shut it down. Disabled the cron jobs, quarantined the LaunchAgents, and rebuilt the whole stack on the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a>. Based off of <strong><a href="https://github.com/earlyaidopters/claudeclaw">ClaudeClaw</a> </strong>from the <a href="https://www.skool.com/earlyaidopters/about">Early AI-Dopters</a> AI learning group. The full post-mortem on <em>why</em>:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d89c9446-5e84-44f9-b29b-d62ecb13eb61&quot;,&quot;caption&quot;:&quot;Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Killed OpenClaw and Built ClaudeClaw Mission Control&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T5FD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6a6400-f0cd-4ff3-8541-f6cccf4d9a87_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-02T23:01:21.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196179846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Why? </strong>The short version is this: I couldn&#8217;t <em>see</em> into OpenClaw. Which, if you scroll back up, is Layer 5: Evaluation &amp; Observability, the exact layer this audit was weakest on.</p><p>You may wonder whether I just copied the 7-layer hardening over to the new stack. I didn&#8217;t, and I want to be honest about that. <strong>I did not port MAESTRO one-for-one.</strong> SecureClaw was written specifically for OpenClaw. Some of its thinking transferred; some of it didn&#8217;t. And the threat model itself moved on (more on that at the end). What the seven layers became was a checklist: for each one, <em>how does the new architecture answer this?</em> Here&#8217;s the scorecard.</p><p><strong>The two layers that changed the most.</strong></p><p><em><strong>Layer 5 (Observability)</strong></em> went from my single biggest weakness to the entire reason ClaudeClaw exists. There&#8217;s now a dedicated agent, <strong>WATCHMAN</strong>, running seven probes every hour: failed tasks, stuck tasks, missed scheduler slots, daemon liveness, content-pipeline health, hidden failures (it greps the success logs for crash text), and delegation crashes. More importantly, there&#8217;s a <em>second</em> healthcheck running as a separate LaunchAgent with its own keychain-backed alert token. If the main daemon dies, the thing that tells me about it is still alive. The rule I wrote for myself out of this: <strong>the watcher cannot share fate with the watched</strong>. There&#8217;s also a behavioral dashboard, DefenseClaw, sitting on 127.0.0.1:3141.</p><p><em><strong>Layer 3 (Agent Frameworks)</strong></em> is where my OpenClaw work actually carried forward. The exec-approvals allowlist from Step 3 above is the direct ancestor of what ClaudeClaw does now, except the enforcement dropped down a level. The first thing I shipped was killing bypassPermissions (the main agent had been running with permission checks disabled, which means a compromised agent has unlimited tool access. The SDK was no ceiling at all), switching to the SDK&#8217;s default permission mode, and handing the main agent a 15-tool allowlist as the single source of truth. Same idea as the OpenClaw allowlist. Enforced by the SDK itself instead of a config file I had to maintain.</p><p>The rest mapped like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQoC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" width="1456" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:339803,&quot;alt&quot;:&quot;Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." title="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>How each of the seven MAESTRO layers from the OpenClaw audit is answered in ClaudeClaw. </em></p><p><em><strong>Layer 1 Foundation Models: </strong>channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved).</em></p><p><em><strong>Layer 2 Data Operations: </strong>Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced and extended).</em></p><p><em><strong>Layer 3 Agent Frameworks:</strong> SDK permission ceiling and a 15-tool allowlist, the direct successor to the OpenClaw exec-approvals list (kept, moved into the SDK).</em></p><p><em><strong>Layer 4 Deployment and Infrastructure: </strong>an egress gateway plus kernel-level pf default-deny (replaced). </em></p><p><em><strong>Layer 5 Evaluation and Observability: </strong>WATCHMAN&#8217;s seven probes and a fate-isolated external healthcheck, the biggest upgrade.</em></p><p><em><strong>Layer 6 Security and Compliance:</strong> out-of-band Telegram confirmation for state-changing actions and a role policy kept separate from content memory (evolved).</em></p><p><em><strong>Layer 7 Agent Ecosystem: </strong>an MCP allowlist plus the tool ceiling as a second layer (kept and hardened). </em></p><p><em>Plus a new row beyond MAESTRO.  <strong>Memory persistence: </strong>TTLs, a hash-chained write log, and canaries.</em></p><p><strong>Where the 7-layer model ran out.</strong></p><p>MAESTRO is a <em>static</em> threat model. It&#8217;s a map of what can go wrong at each layer, frozen in time. What it doesn&#8217;t have a layer for is <strong>persistence</strong>. An attack that lands quietly in your agent&#8217;s memory or vector store and just waits. My scheduler re-enters context every 60 seconds, which means anything dormant in memory fires on a clock. That&#8217;s a different class of problem, and it has a name now: <a href="https://www.semanticscholar.org/paper/Logic-layer-Prompt-Control-Injection-(LPCI)%3A-A-in-Atta-Huang/7209db0a616b54335db85d6e73a0dc9505192e59?utm_source=direct_link">LPCI, Logic-layer Prompt-based Conditional Injection</a>. Hardening against it (I am planning a separate two-part write-up on <a href="https://astgl.substack.com">As The Geek Learns</a>) meant building things MAESTRO never asked for, including a canonicalizer that decodes payloads <em>before</em> they reach the vector store, channel-tagged prompts so the model knows retrieved text is data and not instructions, memory TTLs, a hash-chained write log, and canary entries that page me if memory ever leaks into output.</p><p><strong>What I gave up and what I kept.</strong> The honest cost of the move: I lost local-first. OpenClaw ran on Ollama, fully offline; ClaudeClaw talks to Anthropic&#8217;s API. I still own every byte of my data; it&#8217;s all on my SSD; I just don&#8217;t own the weights anymore. What carried over intact was the philosophy this whole series is built on: every document is a file I can grep, every config is version-controlled, and every decision has a session note. That part never changed.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p><em>This is Part 5 of the Notion Replacement series. We went from &#8220;install an AI agent&#8221; to &#8220;secure it against a 7-layer threat model&#8221; in two days. Follow along at <a href="https://astgl.substack.com">As The Geek Learns</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/secured-ai-agent-7-layer-threat-model/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/secured-ai-agent-7-layer-threat-model/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[5 Questions to Ask Before You Build the AI Project Your CEO Just Pitched]]></title><description><![CDATA[A one-page checklist that turns a vague AI proposal into a decision you can defend in writing.]]></description><link>https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched</link><guid isPermaLink="false">https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Thu, 04 Jun 2026 11:02:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ure4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ure4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ure4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ure4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ure4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b776eec5-058d-4572-b817-5335ae67c625_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37713,&quot;alt&quot;:&quot;ASTGL branded hero image on a dark navy background. Top-left label reads \&quot;ASTGL &#183; DIGITAL TOOLS SERIES\&quot; in orange. The main title spans two lines in large white type: \&quot;5 Questions Before You Build\&quot; and \&quot;the AI Project Your CEO Pitched.\&quot; Below it, an orange subtitle reads \&quot;The 1-page Technical Reality Check.\&quot; In the bottom-right corner, a dark-blue rounded box with an orange border displays \&quot;5 QUESTIONS\&quot; in large orange type and \&quot;to ask first\&quot; in light gray beneath. Decorative horizontal scan-lines run down the left margin. The footer reads \&quot;asthegeeklearns.com\&quot; in gray.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284415?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ASTGL branded hero image on a dark navy background. Top-left label reads &quot;ASTGL &#183; DIGITAL TOOLS SERIES&quot; in orange. The main title spans two lines in large white type: &quot;5 Questions Before You Build&quot; and &quot;the AI Project Your CEO Pitched.&quot; Below it, an orange subtitle reads &quot;The 1-page Technical Reality Check.&quot; In the bottom-right corner, a dark-blue rounded box with an orange border displays &quot;5 QUESTIONS&quot; in large orange type and &quot;to ask first&quot; in light gray beneath. Decorative horizontal scan-lines run down the left margin. The footer reads &quot;asthegeeklearns.com&quot; in gray." title="ASTGL branded hero image on a dark navy background. Top-left label reads &quot;ASTGL &#183; DIGITAL TOOLS SERIES&quot; in orange. The main title spans two lines in large white type: &quot;5 Questions Before You Build&quot; and &quot;the AI Project Your CEO Pitched.&quot; Below it, an orange subtitle reads &quot;The 1-page Technical Reality Check.&quot; In the bottom-right corner, a dark-blue rounded box with an orange border displays &quot;5 QUESTIONS&quot; in large orange type and &quot;to ask first&quot; in light gray beneath. Decorative horizontal scan-lines run down the left margin. The footer reads &quot;asthegeeklearns.com&quot; in gray." srcset="https://substackcdn.com/image/fetch/$s_!ure4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ure4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>5 Questions to Ask Before You Build the AI Project Your CEO Just Pitched</h1><p>You know the email. It shows up Tuesday morning, forwarded with a few lines of enthusiasm and a ChatGPT-drafted proposal attached. "Saw this and thought of us. Can we do this?" The PDF has a logo, bullet points, and exactly zero integration requirements. It also has a six-week timeline and a budget that assumes nothing goes wrong.</p><p>You have somewhere between 24 and 72 hours before your CEO follows up asking what you think.</p><p>If you say yes, you're on the hook for a project you didn't scope. If you say no, you're the person who kills ideas. Neither answer is actually available to you. What you need is a third path: a structured evaluation that produces a defensible, professional response in the time it takes to drink your morning coffee.</p><p>That's what the Technical Reality Check is. Five questions. One page. Every answer points directly at a commitment your organization will have to honor if this project moves forward.</p><p>Here it is in full.</p><h2>The Technical Reality Check: 5 Questions That Surface What the Proposal Left Out</h2><h3>Question 1: What specific business outcome does this solve, and how will we measure success?</h3><p>AI tools generate confident-sounding proposals that describe solutions, not problems. A proposal for "an AI-powered IT ticketing system" describes a technology. It doesn't describe what's broken right now, how broken it is, or what "fixed" looks like in measurable terms.</p><p>Before any conversation about implementation, you need an answer to: what does success look like in six months, and how will we know we hit it? Ticket resolution time down 30%? First-contact resolution rate up 20%? Those are real answers. "Things will be more efficient" is not.</p><p>Unmeasurable projects never officially fail. Which means they never stop consuming resources. This question isn't about being difficult. It's about making sure the organization is buying an outcome, not a technology.</p><p><strong>The red flag:</strong> Any proposal where the only success metric is "we deployed it."</p><h3>Question 2: Who owns the ongoing maintenance, security patching, and vendor relationship?</h3><p>Vendor proposals describe launch day. They are almost entirely silent about year two.</p><p>Every new system creates a permanent maintenance obligation: patching, credential rotation, user access reviews, API deprecations, contract renewals, and a support relationship with a vendor whose incentives are not aligned with yours. If that obligation doesn't have a named owner before the project starts, IT inherits it by default. Forever. Without headcount.</p><p>This question forces the conversation about operational reality before anyone has signed a contract. The answer also tells you a lot about how seriously the proposal was thought through. If nobody has asked "who maintains this?", nobody has thought past the demo.</p><p><strong>The red flag:</strong> "The vendor handles everything." Vendors handle their system. You handle the integration, the credentials, the user provisioning, the data pipeline, and the 2 AM alert when something breaks between their system and yours.</p><h3>Question 3: What happens to our existing systems, data, and processes?</h3><p>New systems don't exist in a vacuum. They touch your directory, your ticketing system, your identity provider, your backup scope, your audit logs. Each of those integration points is a potential failure mode, a migration cost, or a compliance question.</p><p>AI-generated proposals routinely skip integration complexity. This isn't because the AI is being deceptive. It's because the AI generating the proposal doesn't know your stack. The proposal was written in a context-free environment. Your environment is anything but.</p><p>Before committing, you need to know: what does this touch, and what has to move or change for it to work? And who does that work? Data migration alone can turn a "simple" deployment into a multi-month project. Asking this question early is how you find out.</p><p><strong>The red flag:</strong> "It integrates easily with your existing tools." That's a sales phrase, not an engineering estimate. "Easy" is undefined until your systems engineer has looked at the API docs.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Question 4: What's the realistic timeline and resource cost, not the optimistic one?</h3><p>Vendor timelines assume clean data, available staff, smooth approvals, and nothing else on the backlog. Your timeline accounts for your actual team, their current commitments, the security review cycle, the change management process, and the three things nobody predicted.</p><p>The gap between those two numbers is usually where projects go sideways. Not because the technology failed, but because the plan never accounted for reality.</p><p>This question also surfaces a common pattern: the timeline was set before IT was consulted. Any timeline that precedes a technical assessment is a guess dressed up as a schedule. You're the one who'll be explaining the delay when the guess turns out to be wrong.</p><p><strong>The red flag:</strong> A go-live date in the proposal. That's not a plan, it's a target somebody made up. Ask who set it and what it was based on.</p><h3>Question 5: What's the exit strategy if this doesn't work as expected?</h3><p>Every vendor says their product works. You need a plan for when it doesn't. When the pricing doubles at renewal. When the company gets acquired and support degrades. When a compliance requirement changes and the product doesn't keep up.</p><p>Data portability, rollback procedures, and contractual exit terms are not pessimism. They're the difference between a manageable failure and a situation where you're paying for a system that doesn't work because migrating off it is too expensive to contemplate.</p><p>This question also signals organizational maturity. IT teams that ask exit questions before they sign contracts don't get held hostage. IT teams that don't ask end up managing a five-year sunset project for a tool they stopped believing in three years ago.</p><p><strong>The red flag:</strong> "We can always just stop using it." Can you migrate your data? In what format? At what cost? How long does it take? If nobody has asked those questions, stopping isn't as simple as it sounds.</p><h2>The Checklist in Practice: Walking Through a Real Scenario</h2><p>Here's what a Technical Reality Check pass looks like when you actually run it.</p><p>Your CEO forwards a ChatGPT-drafted proposal on a Monday morning. The subject line is "AI Agent for IT Ticket Triage." The proposal is two pages. It describes an AI system that reads incoming IT tickets, categorizes them by priority and type, routes them to the right team, and drafts first-response emails automatically. There's a mockup screenshot. There's a line about "easy integration with your existing ITSM." There's a timeline: six weeks to deployment.</p><p>You open the Technical Reality Check.</p><p><strong>Q1: What specific business outcome does this solve?</strong></p><p>The proposal says "reduce response times and improve IT efficiency." No baseline. No metric. You check your current ITSM data: average first response is 4.2 hours, your SLA target is 2 hours, you're meeting it 71% of the time. Now you have a problem worth solving. You write it down: "We need first-response SLA compliance above 85%. Current state: 71%." That's the outcome. If the AI system can't demonstrate a path to that specific number, the conversation is premature.</p><p><strong>Q2: Who owns maintenance and the vendor relationship?</strong></p><p>Nobody is named in the proposal. You have a team of four. One of them is already carrying the ITSM admin role. You note: this needs a named owner and a rough estimate of ongoing hours before it can go to planning. You also flag the API integration dependency: your ITSM has a rate-limited API that's caused problems before. Someone needs to read the vendor's API docs before "easy integration" gets treated as a fact.</p><p><strong>Q3: What happens to existing systems and data?</strong></p><p>Your ticketing data includes ticket histories, customer records, and some attachments. The proposal doesn't mention data handling. You note two questions: where does ticket data go once the AI processes it, and what are the data residency requirements given that you handle some HIPAA-adjacent systems? That second question alone could be a blocker. You don't know yet, but you know to ask.</p><p><strong>Q4: What's the realistic timeline and resource cost?</strong></p><p>Six weeks assumes nothing else is happening. Your team is currently in the middle of a server migration that runs through the end of the month. Realistically, this project can't start until mid-next-month, and your most experienced engineer (the one who'd need to own the integration) is at 90% utilization. You write down: "Realistic start: six weeks out. Realistic deployment: 12-16 weeks from proposal receipt. Not 6."</p><p><strong>Q5: What's the exit strategy?</strong></p><p>The proposal doesn't mention it. You note: before any contract, you need to know the data export format, the contract term length, and what happens to stored ticket data at offboarding.</p><p>That's it. You've just done a Technical Reality Check. Total time: 15 minutes.</p><p>Now you can write a response. Not "no." Not "yes." Something like: "I've done a preliminary review. Before we can assess feasibility, I need answers to five specific questions. Here they are. Happy to set up 30 minutes to walk through them together." You've moved the conversation from enthusiasm to decision-ready. You've protected the organization without being obstructionist. And you have a written record of the questions you asked, which matters if the project later goes sideways without those answers ever being provided.</p><p>That's the whole point of the Technical Reality Check. It's not a rejection letter. It's the question set that separates proposals worth pursuing from proposals worth deferring.</p><h2>What the Rest of the Toolkit Covers</h2><p>The Technical Reality Check is the first thing you run. It gets you to a defensible position in 15 minutes. But the full response (the one that protects your career, your team's credibility, and the organization's resources) needs more than five questions.</p><p>The complete AI Request Deflection Toolkit includes three email templates that turn your Reality Check findings into professional communications: an initial deflection that buys time while signaling genuine interest, a risk escalation that documents specific technical concerns in business-impact terms, and a stakeholder alignment template that ends the email thread and gets the right people in a room with a decision mandate. Every template has a filled-in worked example so you can see exactly what "adapted" looks like.</p><p>There's also a 15-question weighted scoring matrix in CSV and Sheets format. It turns "I have concerns" into "the proposal scores 41% against our evaluation criteria, which triggers a formal risk review." Objective. Defensible. Exportable. The kind of documentation that holds up in a post-project conversation.</p><p>And there's an escalation playbook for situations where the initial deflection didn't land and the project is being pushed forward without proper review. That one's for the harder conversations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wc4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wc4T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wc4T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69601,&quot;alt&quot;:&quot;A product card for the AI Request Deflection Toolkit priced at $24.99. An orange header bar displays \&quot;ASTGL DIGITAL TOOLS &#183; IN THE FULL KIT\&quot; and \&quot;AI Request Deflection Toolkit,\&quot; with a navy price badge showing \&quot;$24.99\&quot; in orange at the top right. Below, an orange label reads \&quot;WHAT'S GATED IN THE FULL KIT.\&quot; Five items follow, each with a green circle checkmark: (1) \&quot;3 email templates\&quot; &#8212; Initial deflection, Risk escalation, and Stakeholder alignment, each with 3 subject lines; (2) \&quot;15-question scoring matrix\&quot; &#8212; weighted across Business Value, Technical Complexity, Risk, Resource Reality, and Integration Impact, available as CSV/Sheets/Excel; (3) \&quot;Worked examples in every template\&quot; &#8212; see the filled-in version before you write yours; (4) \&quot;Step-by-step README workflow\&quot; &#8212; from email receipt to professional response; (5) \&quot;Defensible-decision framework\&quot; &#8212; reusable for the next AI proposal. Footer reads \&quot;Get the full kit: shop.asthegeeklearns.com/products/ai-deflection-toolkit.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284415?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A product card for the AI Request Deflection Toolkit priced at $24.99. An orange header bar displays &quot;ASTGL DIGITAL TOOLS &#183; IN THE FULL KIT&quot; and &quot;AI Request Deflection Toolkit,&quot; with a navy price badge showing &quot;$24.99&quot; in orange at the top right. Below, an orange label reads &quot;WHAT'S GATED IN THE FULL KIT.&quot; Five items follow, each with a green circle checkmark: (1) &quot;3 email templates&quot; &#8212; Initial deflection, Risk escalation, and Stakeholder alignment, each with 3 subject lines; (2) &quot;15-question scoring matrix&quot; &#8212; weighted across Business Value, Technical Complexity, Risk, Resource Reality, and Integration Impact, available as CSV/Sheets/Excel; (3) &quot;Worked examples in every template&quot; &#8212; see the filled-in version before you write yours; (4) &quot;Step-by-step README workflow&quot; &#8212; from email receipt to professional response; (5) &quot;Defensible-decision framework&quot; &#8212; reusable for the next AI proposal. Footer reads &quot;Get the full kit: shop.asthegeeklearns.com/products/ai-deflection-toolkit.&quot;" title="A product card for the AI Request Deflection Toolkit priced at $24.99. An orange header bar displays &quot;ASTGL DIGITAL TOOLS &#183; IN THE FULL KIT&quot; and &quot;AI Request Deflection Toolkit,&quot; with a navy price badge showing &quot;$24.99&quot; in orange at the top right. Below, an orange label reads &quot;WHAT'S GATED IN THE FULL KIT.&quot; Five items follow, each with a green circle checkmark: (1) &quot;3 email templates&quot; &#8212; Initial deflection, Risk escalation, and Stakeholder alignment, each with 3 subject lines; (2) &quot;15-question scoring matrix&quot; &#8212; weighted across Business Value, Technical Complexity, Risk, Resource Reality, and Integration Impact, available as CSV/Sheets/Excel; (3) &quot;Worked examples in every template&quot; &#8212; see the filled-in version before you write yours; (4) &quot;Step-by-step README workflow&quot; &#8212; from email receipt to professional response; (5) &quot;Defensible-decision framework&quot; &#8212; reusable for the next AI proposal. Footer reads &quot;Get the full kit: shop.asthegeeklearns.com/products/ai-deflection-toolkit.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!wc4T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Cost of Not Having a Process</h2><p>Most IT managers who get burned by an executive-forwarded AI project didn't fail because the technology was bad. They failed because they said yes before they had answers, or they said no in a way that got overridden, or they said "we have concerns" without the documentation to back it up when the concerns turned out to be right.</p><p>A 15-minute structured evaluation is the cheapest investment in that problem. Run it every time. Document the answers. Keep the record.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched/comments"><span>Leave a comment</span></a></p><p>If you want the full toolkit (the email templates, the scoring matrix, the escalation playbook, and all the worked examples), it's at the store for $24.99.</p><p><a href="https://shop.asthegeeklearns.com/products/ai-deflection-toolkit">Get the AI Request Deflection Toolkit</a></p><p>The <strong>Technical Reality Check</strong> above is yours to use as-is. Print it. Keep it at your desk. The next email is coming.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Anthropic Shipped an AI Security Scanner. Here's the Per-PR Cost Math.]]></title><description><![CDATA[Before you add anything to CI, know exactly what it costs per pull request and how to triage what it finds.]]></description><link>https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math</link><guid isPermaLink="false">https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Tue, 02 Jun 2026 15:04:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GqAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The first time my manager asked, &#8220;Are we using AI to scan PRs for vulnerabilities yet?" I said I'd look into it. Then I spent four hours reading docs, pricing pages, and GitHub issues before I had a number I trusted enough to put in a Slack message.</p><p>That should have taken twenty minutes. The number exists. The math is straightforward. Nobody had written it down in one place where a platform engineer could find it.</p><p>Anthropic quietly shipped `anthropics/claude-code-security-review` as a first-party GitHub Action. You add a workflow file, point it at a secret, and it posts a findings comment on every pull request. The scanner reasons about code rather than matching signatures, which means it catches things like logic-level injection paths that a regex-based tool would miss. It also means the false-positive profile is different from what you're used to, and you need a triage process before you wire it to branch protection.</p><p>This article gives you the cost math and the triage playbook in full. Both are things you'd need even if you built this yourself.</p><h2>Why "Just Run It" Isn't a Strategy</h2><p>Adding a CI step that calls an LLM API isn't free, and it isn't free to manage. There are two failure modes I see teams hit.</p><p>The first is budget surprise. Someone adds the scanner, it runs for a month, the cloud bill shows up, and the conversation gets uncomfortable because nobody did the math upfront. The scanner doesn't cost a lot, but "not a lot" needs a number attached to it before you walk into a budget conversation.</p><p>The second failure mode is alert fatigue. The scanner finds something on every PR. Engineers start skimming the findings comment the same way they skim Dependabot. One day there's a real SQL injection in a PR, it's buried in a list of five findings, and it merges. The triage process is what keeps findings meaningful instead of noise.</p><p>Both problems are solvable. The math takes ten minutes. The triage rubric takes one meeting to agree on. Neither requires buying anything yet.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What This Costs Per PR (The Real Numbers)</h2><p>Claude bills per token. One token is roughly four characters of text. A PR diff gets converted to tokens and sent to the model as input. The model's findings comment is output tokens. The formula is simple:</p><pre><code>Cost = (input_tokens &#215; input_rate) + (output_tokens &#215; output_rate)</code></pre><p>For Claude Sonnet 4.6, the rates are approximately $3 per million input tokens and $15 per million output tokens. (Verify current pricing at platform.anthropic.com before your next budget conversation. Rates change.)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GqAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GqAd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GqAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/220f9621-05c5-456b-a841-8ef55801962f_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37679,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200281387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GqAd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Scenario 1: A 200-Line PR Diff</h3><p>A focused bug fix or small feature. Maybe three files changed.</p><pre><code>Component                                  Tokens   Rate           Cost
-----------------------------------------------------------------------
System prompt + workflow context (input)    2,000   $3.00 / 1M     $0.006
PR diff, ~200 lines (input)                 1,300   $3.00 / 1M     $0.004
Findings output, 1-2 findings (output)        600   $15.00 / 1M    $0.009
-----------------------------------------------------------------------
Total per PR                                                       ~$0.019</code></pre><p>Call it two cents. For a small PR, this is a rounding error.</p><h3>Scenario 2: A 2,000-Line PR Diff</h3><p>A refactor, a new feature, a dependency upgrade touching multiple services.</p><pre><code>Component                                   Tokens   Rate           Cost
------------------------------------------------------------------------
System prompt + workflow context (input)     2,000   $3.00 / 1M     $0.006
PR diff, ~2,000 lines (input)               13,000   $3.00 / 1M     $0.039
Findings output, 2-4 findings (output)       1,500   $15.00 / 1M    $0.023
------------------------------------------------------------------------
Total per PR                                                        ~$0.068</code></pre><p>Seven cents. Still noise for a single PR.</p><h3>Monthly Back-of-the-Envelope</h3><p>The question your manager will ask isn't &#8220;What does one PR cost?" It's &#8220;What does this cost per month?"</p><p>If your team merges 80 PRs a month (about 4 per business day), with a mix of small and medium diffs averaging around $0.04 per scan:</p><pre><code>80 PRs &#215; $0.04 = $3.20/month</code></pre><p>Even if your average PR runs larger, say closer to the 2,000-line scenario at $0.07 each:</p><pre><code>80 PRs &#215; $0.07 = $5.60/month</code></pre><p>A busy multi-team repo at 400 PRs a month at $0.07 each is $28/month. That's less than one developer's Spotify subscription. The cost math isn't the obstacle here. The obstacle is having a triage process in place before you flip it on.</p><p>One practical note: output token count varies with how many findings the scanner generates. Zero findings produces shorter output and costs less. Ten findings costs a bit more. The estimates above assume one to three findings per PR, which is realistic for an established codebase with existing security hygiene.</p><h2>The 3-Tier Triage Rubric</h2><p>Every finding the scanner posts needs to land in one of three buckets. Here's the decision framework.</p><p><strong>REAL: Block the merge. Fix it.</strong></p><p>A finding is REAL when it describes an exploitable path with proof. The scanner should show you the specific line, and explain how an attacker would reach it, and the explanation should hold up when you read the code yourself. SQL injection via string concatenation in a request handler is REAL. Hardcoded credentials that actually ship to production are REAL.</p><p>The discriminator: "If an attacker had this codebase and five minutes, could they demonstrate this?" If yes, it's REAL. Block the PR and fix it before merge.</p><p><strong>PROBABLE: Human review required.</strong></p><p>A finding is PROBABLE when the pattern is plausible, but context matters. The scanner can see the diff, not the full runtime environment. A finding might flag a code path that looks injectable, but your framework wraps every database call with prepared statements at a layer the scanner can't see. Or the flagged code only runs in a context that requires prior authentication the scanner doesn't know about.</p><p>The discriminator: "This could be real, but I need someone who knows this codebase to confirm." Don't block the PR automatically. Route it to the PR author or a senior engineer. Give it a two-hour resolution window before it escalates.</p><p><strong>DISCARD: Suppress it with a documented rule.</strong></p><p>A finding is DISCARD when it's structurally a false positive. The scanner flagged test code that never runs in production. It flagged a generated file you don't own. It flagged a template placeholder in an IaC file that gets substituted at deploy time. It flagged a public API URL as a hardcoded credential because the word "key" appeared in the variable name.</p><p>The discriminator: "Would an attacker gain anything by knowing this?" If no, it's a DISCARD. The important part is that you document why. Suppressing without a comment is how you end up silently ignoring real findings six months later when the context is gone.</p><h2>A Worked Example: The SQLAlchemy False Positive</h2><p>Here's the kind of finding that will show up on your team in the first two weeks if you use any ORM.</p><p>A PR adds a new search endpoint. Somewhere in the diff, there's code like this:</p><pre><code>def search_users(search_term: str):
    results = db.session.query(User).filter(
        User.name.ilike(f"%{search_term}%")
    ).all()
    return results</code></pre><p>The scanner flags it as a potential SQL injection vulnerability. The finding explains that `search_term` appears to be user-controlled input and is being interpolated into a query string. Severity: HIGH.</p><p>A human reading this would notice a few things. The code uses SQLAlchemy's ORM layer. The `.ilike()` method is a SQLAlchemy query construct, not a raw SQL string. SQLAlchemy sends the query to the database as a parameterized statement with the value bound separately, which is exactly the defense against SQL injection. The `f"%{search_term}%"` is constructing the pattern string in Python, but that pattern gets passed as a bound parameter by the driver.</p><p>This is a DISCARD. The scanner saw string interpolation near a database call and correctly identified that as a pattern worth flagging. It couldn't see that the ORM handles parameterization automatically.</p><p>The suppression note you'd document reads something like:</p><blockquote><p>SQLAlchemy ORM calls via `.filter()`, `.ilike()`, `.like()`, and similar query methods use parameterized queries automatically. String interpolation to construct pattern values (e.g., for LIKE clauses) does not create injection risk when using these methods. Do not flag SQLAlchemy ORM filter calls as SQL injection.</p></blockquote><p>That note goes into a filter file your workflow references. The same class of finding stops appearing on every PR that touches a database query.</p><p>Two things to notice about this example. First, the scanner wasn't wrong to flag it. Without ORM context, string interpolation near a SQL-like method call is exactly what a good scanner should notice. Second, the suppression is better than just dismissing it, because the documented rule now covers every future PR using the same pattern. You pay the triage cost once.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>What the Full Kit Covers</h2><p>The cost math and triage rubric are the foundation, but they don't tell you how to wire any of this into GitHub.</p><p>The full guide covers the GitHub Actions workflow YAML itself (the one that calls `anthropics/claude-code-security-review` and handles the findings response), how to set up branch protection so that HIGH findings actually block merges instead of just posting a comment, and the in-workflow automation that runs the REAL/PROBABLE/DISCARD classification before the comment lands on the PR.</p><p>There's also a head-to-head with GPT-4o as a second-opinion scanner. They're not equivalent tools. The Anthropic action is purpose-built for this job. The GPT-4o path is a chat completions API call with a security prompt, which costs about seven times less per PR but produces more variable results. The comparison matrix helps you decide whether a two-week pilot with both scanners running simultaneously is worth the extra spend.</p><p>The suppression filter file format is documented in full, with a worked example filter file for a Next.js and SQLAlchemy codebase that covers the five most common false-positive patterns before you even see your first finding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9PzT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9PzT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9PzT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200281387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9PzT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>One Thing Before You Add It to CI</h2><p>The scanner is easy to add. Ten minutes from zero to your first findings comment. The harder question is whether your team has agreed on what to do with those findings before the first PR triggers.</p><p>That conversation takes one team meeting. You need three agreements: what severity blocks a merge automatically, who owns the weekly rotation for PROBABLE findings that authors didn't resolve, and what the bar is for adding a DISCARD rule.</p><p>If you want to run that meeting with the cost math and triage rubric in hand, you have both now. If you want the workflow YAML, the branch protection setup, and the suppression filter format so you're not building those from scratch, the full guide is at the link below.</p><p>What's your current setup for catching security issues in PRs before they merge? Genuinely curious whether teams are using static analysis tools, relying on code review, or still treating it as a post-deploy problem.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math/comments"><span>Leave a comment</span></a></p><p><em>The full CI/CD template with GitHub Actions workflow, merge-gating logic, and false-positive triage automation is at </em><a href="https://shop.asthegeeklearns.com/products/claude-code-security-scan-cicd-template">the ASTGL store</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Stop Paying for Cloud APIs: Building a Local AI Stack on Mac Studio]]></title><description><![CDATA[How to leverage Apple Silicon's unified memory for production-grade LLMs and replace your cloud billing entirely.]]></description><link>https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio</link><guid isPermaLink="false">https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 01 Jun 2026 17:08:10 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/119a1a3d-31a2-4bac-9f7b-23880a131212_2352x882.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Running LLMs locally usually feels like a compromise. You either get tiny, fast models that can't think or massive models that crawl at one word per minute. But with the right hardware, you can break that trade-off and replace your cloud billing entirely.</p><h2>The Setup</h2><p>The dilemma most developers face is a choice between two bad options. On one side, you have cloud APIs like OpenAI or Anthropic. They are easy to use and incredibly smart, but they come with a heavy "API tax" and privacy concerns. If you're processing proprietary code or sensitive customer data, sending that information to a third-party server is a massive risk.</p><p>On the other side, you have traditional local setups. Usually, you're limited by the VRAM on your GPU. If you have a standard consumer card with 12 GB or 24 GB of VRAM, you're stuck with small models. You can't run the heavy-hitters that actually compete with GPT-5. This creates a wall where local AI is only good for "toy" problems, while production workloads stay in the cloud.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b7bbee5a-1910-407c-b73e-8c9adc4916ce&quot;,&quot;duration&quot;:null}"></div><p></p><h2>The Hardware Math</h2><p>The real secret to breaking this wall is Apple Silicon's unified memory. On a Mac Studio with an M3 Ultra, the 256 GB of memory is shared between the CPU and the GPU. This eliminates the VRAM bottleneck that kills most local setups. You aren't limited by a tiny slice of video memory; you're limited by the total pool of system memory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pQPd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pQPd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 424w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 848w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1272w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pQPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png" width="1456" height="1436" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1436,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193106,&quot;alt&quot;:&quot;256GB of Mac Studio unified memory partitioned between roughly 107GB of active model weights (DeepSeek-R1 70B at 42GB, Qwen3-32B at 20GB, Qwen2.5-Coder at 19GB, Qwen3-8B at 5.2GB, Nomic-Embed at 0.27GB) and roughly 149GB of system overhead and buffer (macOS, KV cache, disk swap).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="256GB of Mac Studio unified memory partitioned between roughly 107GB of active model weights (DeepSeek-R1 70B at 42GB, Qwen3-32B at 20GB, Qwen2.5-Coder at 19GB, Qwen3-8B at 5.2GB, Nomic-Embed at 0.27GB) and roughly 149GB of system overhead and buffer (macOS, KV cache, disk swap)." title="256GB of Mac Studio unified memory partitioned between roughly 107GB of active model weights (DeepSeek-R1 70B at 42GB, Qwen3-32B at 20GB, Qwen2.5-Coder at 19GB, Qwen3-8B at 5.2GB, Nomic-Embed at 0.27GB) and roughly 149GB of system overhead and buffer (macOS, KV cache, disk swap)." srcset="https://substackcdn.com/image/fetch/$s_!pQPd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 424w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 848w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1272w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you look at the actual numbers, the math becomes very clear. Here is how I structure my model loading on this machine:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Upfq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Upfq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 424w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 848w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1272w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Upfq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112325,&quot;alt&quot;:&quot;Local model lineup on the Mac Studio: qwen3:8b (5.2GB, very fast) for calendar/security/scoring; qwen3:32b-fast (20GB, interactive) for articles/research/drafts; qwen2.5-coder (19GB, interactive) for code review/git/SQL; deepseek-r1:70b (42GB, ~2.78 tok/s) for deep research background only; nomic-embed-text (274MB, instant) for RAG embeddings.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Local model lineup on the Mac Studio: qwen3:8b (5.2GB, very fast) for calendar/security/scoring; qwen3:32b-fast (20GB, interactive) for articles/research/drafts; qwen2.5-coder (19GB, interactive) for code review/git/SQL; deepseek-r1:70b (42GB, ~2.78 tok/s) for deep research background only; nomic-embed-text (274MB, instant) for RAG embeddings." title="Local model lineup on the Mac Studio: qwen3:8b (5.2GB, very fast) for calendar/security/scoring; qwen3:32b-fast (20GB, interactive) for articles/research/drafts; qwen2.5-coder (19GB, interactive) for code review/git/SQL; deepseek-r1:70b (42GB, ~2.78 tok/s) for deep research background only; nomic-embed-text (274MB, instant) for RAG embeddings." srcset="https://substackcdn.com/image/fetch/$s_!Upfq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 424w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 848w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1272w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>If you load all of these concurrently, you're using roughly 107 GB of memory. That leaves about 149 GB for the macOS, your browser, your IDE, and everything else. This allows you to run a 32B model for writing, a 72B for research, and an 8B for quick checks all at the same time.</p><p>The economics are just as compelling. A Mac Studio setup costs anywhere from $4,000 to $7,000 as a one-time purchase. If your production workflows are costing you $200 to $500 per month in cloud tokens, the hardware pays for itself in 12 to 18 months. After that, the "cost" of running a massive model is basically just the electricity it uses. Plus, you finally own your data.</p><h2>Temperature Is a Randomness Dial, Not a Quality Dial</h2><p>I see a lot of tutorials that suggest using a temperature of 0.7 for every single prompt. That is a mistake. Temperature doesn't make a model "smarter" or "better." It is simply a randomness dial. It controls how much the model is allowed to deviate from the most likely next word.</p><p>If you use the same temperature for everything, your pipeline will fail. For tasks requiring high precision, a high temperature will introduce hallucinations. For creative tasks, a low temperature will make the output feel robotic and repetitive.</p><p>In my production newsletter pipeline, I use a specific routing table to manage this:</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XGPw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XGPw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 424w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 848w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XGPw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png" width="1456" height="921" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:921,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98677,&quot;alt&quot;:&quot;Per-task temperature settings: topic generation 0.7 (creative variety); research compilation 0.3 (minimize hallucination); article drafting 0.7 (natural prose); voice humanization 0.8 (more natural, varied output); fact-check extraction 0.1 (near-deterministic precision); fact-check verdict 0.1 (no room for ambiguity); social media notes 0.7 (casual, engaging tone).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Per-task temperature settings: topic generation 0.7 (creative variety); research compilation 0.3 (minimize hallucination); article drafting 0.7 (natural prose); voice humanization 0.8 (more natural, varied output); fact-check extraction 0.1 (near-deterministic precision); fact-check verdict 0.1 (no room for ambiguity); social media notes 0.7 (casual, engaging tone)." title="Per-task temperature settings: topic generation 0.7 (creative variety); research compilation 0.3 (minimize hallucination); article drafting 0.7 (natural prose); voice humanization 0.8 (more natural, varied output); fact-check extraction 0.1 (near-deterministic precision); fact-check verdict 0.1 (no room for ambiguity); social media notes 0.7 (casual, engaging tone)." srcset="https://substackcdn.com/image/fetch/$s_!XGPw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 424w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 848w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are two key takeaways here. First, for fact-checking, you want the temperature at 0.1. This makes claim extraction repeatable and ensures your verdicts are consistent every time you run the script. Second, setting the temperature to 0.8 for "humanization" might seem counterintuitive, but it works. A higher temperature allows the model to make less predictable word choices, which actually produces more natural, less "AI-sounding" prose.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N4kh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N4kh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 424w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 848w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1272w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N4kh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png" width="1456" height="904" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:904,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:140165,&quot;alt&quot;:&quot;Temperature-based router flowchart: a user prompt enters a temperature check, then routes to Fact-Check Mode (temp 0.1 &#8594; DeepSeek-R1), Research Mode (0.3 &#8594; Qwen3-32B), Drafting Mode (0.7 &#8594; Qwen2.5-Coder), or Humanization Mode (0.8+ &#8594; Qwen3-8B).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Temperature-based router flowchart: a user prompt enters a temperature check, then routes to Fact-Check Mode (temp 0.1 &#8594; DeepSeek-R1), Research Mode (0.3 &#8594; Qwen3-32B), Drafting Mode (0.7 &#8594; Qwen2.5-Coder), or Humanization Mode (0.8+ &#8594; Qwen3-8B)." title="Temperature-based router flowchart: a user prompt enters a temperature check, then routes to Fact-Check Mode (temp 0.1 &#8594; DeepSeek-R1), Research Mode (0.3 &#8594; Qwen3-32B), Drafting Mode (0.7 &#8594; Qwen2.5-Coder), or Humanization Mode (0.8+ &#8594; Qwen3-8B)." srcset="https://substackcdn.com/image/fetch/$s_!N4kh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 424w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 848w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1272w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>The OpenAI Compatibility Trick</h2><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The best part about using Ollama for this setup is that you don't have to rewrite your entire codebase. Ollama exposes an OpenAI-compatible API at `localhost:11434/v1`. This means any tool, library, or SDK that respects the `OPENAI_BASE_URL` environment variable can be redirected to your local machine with almost zero effort.</p><p>You can point your existing Python scripts or LangChain agents to your local Mac by simply setting these variables in your terminal:</p><pre><code>export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_API_KEY=ollama  # Any value works; Ollama doesn't check this</code></pre><p>If you are working within a configuration file, such as a JSON config for a custom agent, it looks like this:</p><pre><code>{
  "model": "openai/qwen3:32b-fast",
  "openai_base_url": "http://localhost:11434/v1",
  "openai_api_key": "ollama"
}</code></pre><p>Every LangChain chain, every summarization script, and every SDK that follows the OpenAI protocol becomes a free local-model call. You can migrate an entire project from GPT-4 to your local M3 Ultra in about 30 seconds.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NEhc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NEhc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 424w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 848w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NEhc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png" width="1456" height="847" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:847,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166494,&quot;alt&quot;:&quot;Sequence diagram of an OpenAI-compatible request: the client app (Cursor or other IDE) points the OPENAI_BASE_URL environment variable at the local server (localhost:11434/v1), then sends a standard POST /v1/chat/completions; the local server executes inference on the local LLM (Ollama or vLLM) and streams a JSON response back in OpenAI format.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequence diagram of an OpenAI-compatible request: the client app (Cursor or other IDE) points the OPENAI_BASE_URL environment variable at the local server (localhost:11434/v1), then sends a standard POST /v1/chat/completions; the local server executes inference on the local LLM (Ollama or vLLM) and streams a JSON response back in OpenAI format." title="Sequence diagram of an OpenAI-compatible request: the client app (Cursor or other IDE) points the OPENAI_BASE_URL environment variable at the local server (localhost:11434/v1), then sends a standard POST /v1/chat/completions; the local server executes inference on the local LLM (Ollama or vLLM) and streams a JSON response back in OpenAI format." srcset="https://substackcdn.com/image/fetch/$s_!NEhc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 424w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 848w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Why This Pattern Matters</h2><p>This isn't just about saving money on API credits. It is about architectural sovereignty. When you move your core intelligence layer to local hardware, you remove the dependency on a single vendor's uptime, pricing changes, and content filtering policies.</p><p>The pattern of using unified memory to host multiple specialized models at different temperatures allows you to build a "factory" of intelligence. You have a high-speed 8B model for sorting, a balanced 32B model for drafting, and a heavy 70B model for deep reasoning, all running in the same memory space. This is how you build a production-grade AI stack that is private, permanent, and incredibly cost-effective.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Eo7L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Eo7L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 424w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 848w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1272w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90833,&quot;alt&quot;:&quot;ROI payback model: one-time hardware cost of $4k&#8211;$7k plus avoided monthly API fees of $200&#8211;$500 yields Month 0 high capex, Month 12 break-even, and Month 18+ pure savings.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ROI payback model: one-time hardware cost of $4k&#8211;$7k plus avoided monthly API fees of $200&#8211;$500 yields Month 0 high capex, Month 12 break-even, and Month 18+ pure savings." title="ROI payback model: one-time hardware cost of $4k&#8211;$7k plus avoided monthly API fees of $200&#8211;$500 yields Month 0 high capex, Month 12 break-even, and Month 18+ pure savings." srcset="https://substackcdn.com/image/fetch/$s_!Eo7L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 424w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 848w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1272w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>( This cost calculation was based on 6-month-ago pricing when I bought my Mac Studio. Since then the availability of Mac Studios with large amounts of unified memory has evaporated. This has driven up pricing. Hopefully this is temporary. )</p><h2>Quick Reference</h2><p><strong>Key Commands</strong></p><ul><li><p>Set local base URL: `export OPENAI_BASE_URL=http://localhost:11434/v1`</p></li><li><p>Check running models: `ollama ps`</p></li></ul><p><strong>Temperature Cheat Sheet</strong></p><ul><li><p><strong>0.1 to 0.3:</strong> Extraction, coding, fact-checking, and structured data (JSON).</p></li><li><p><strong>0.7:</strong> General purpose, drafting, and summarization.</p></li><li><p><strong>0.8 to 1.0:</strong> Creative writing, brainstorming, and persona simulation.</p></li></ul><p><em>Found this useful? I share practical lessons from my systems engineering and AI journey at </em><a href="https://astgl.substack.com">As The Geek Learns</a> </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio/comments"><span>Leave a comment</span></a></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The Pope Wrote a Memo to AI Developers. Most of You Missed It.]]></title><description><![CDATA[A builder's read of Magnifica Humanitas&#8212;what 'disarm AI' actually means, why 'alignment' alone isn't enough, and what to do about it in your config.]]></description><link>https://astgl.com/p/pope-memo-to-ai-developers</link><guid isPermaLink="false">https://astgl.com/p/pope-memo-to-ai-developers</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Fri, 29 May 2026 12:03:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rZKE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>On May 25, the Vatican released eighty-two pages on artificial intelligence. The headline everyone ran with was <em>disarm AI</em>. That's not the most interesting part.</p><p>The most interesting part is paragraph 111. It's a direct, two-paragraph appeal to people who build AI. I've read most of the major coverage now&#8212;Vatican<em> News</em>, <em>NCR</em>, <em>America</em>, <em>USCCB</em>, NPR&#8212;and almost none of it quoted that paragraph. The other thing nobody seems to have noticed: Christopher Olah, co-founder of Anthropic, was at the Vatican presentation. The lab that ships Claude sent a senior researcher to stand next to the Pope while he released this thing.</p><p>I'm a systems engineer, not a theologian. But I build agents for a living now, and I read all eighty-two pages. Here's what stuck.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rZKE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rZKE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rZKE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65f8ced1-c08a-448b-babf-beda16327045_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:294739,&quot;alt&quot;:&quot;A typographic hero image on a warm parchment-cream background with a thin cardinal-red side bar. Above the main mark, the eyebrow text reads \&quot;MAGNIFICA HUMANITAS &#8212; 2026\&quot; in cardinal red. The centerpiece is the phrase \&quot;Paragraph 111\&quot; set in large black serif type, with the italic subtitle \&quot;the part written for us.\&quot; just below. A short divider line separates this from the article title, \&quot;The Pope Wrote a Memo to AI Developers. Most of You Missed It.,\&quot; set in three centered sans-serif lines. The footer reads \&quot;AS THE GEEK LEARNS &#183; ASTGL.SUBSTACK.COM.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A typographic hero image on a warm parchment-cream background with a thin cardinal-red side bar. Above the main mark, the eyebrow text reads &quot;MAGNIFICA HUMANITAS &#8212; 2026&quot; in cardinal red. The centerpiece is the phrase &quot;Paragraph 111&quot; set in large black serif type, with the italic subtitle &quot;the part written for us.&quot; just below. A short divider line separates this from the article title, &quot;The Pope Wrote a Memo to AI Developers. Most of You Missed It.,&quot; set in three centered sans-serif lines. The footer reads &quot;AS THE GEEK LEARNS &#183; ASTGL.SUBSTACK.COM.&quot;" title="A typographic hero image on a warm parchment-cream background with a thin cardinal-red side bar. Above the main mark, the eyebrow text reads &quot;MAGNIFICA HUMANITAS &#8212; 2026&quot; in cardinal red. The centerpiece is the phrase &quot;Paragraph 111&quot; set in large black serif type, with the italic subtitle &quot;the part written for us.&quot; just below. A short divider line separates this from the article title, &quot;The Pope Wrote a Memo to AI Developers. Most of You Missed It.,&quot; set in three centered sans-serif lines. The footer reads &quot;AS THE GEEK LEARNS &#183; ASTGL.SUBSTACK.COM.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!rZKE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Magnifica Humanitas Paragraph 111</figcaption></figure></div><h2>The Pope knows how AI actually works</h2><p>This is the part that surprised me.</p><p>Most religious commentary on technology reads like it was written by somebody who has never opened a terminal. Magnifica Humanitas doesn't. In paragraph 98, Leo writes:</p><blockquote><p>current AI systems are more "cultivated" than "built," for developers do not directly design every detail, but instead create a framework within which the intelligence "grows." As a result, fundamental scientific aspects &#8212; such as the internal representations and computational processes of these systems &#8212; remain, at present, unknown.</p></blockquote><p>That is a correct, careful description of how transformer training works. It's a Pope acknowledging mechanistic interpretability is an open problem. In an encyclical. Without using the word transformer.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>He continues in paragraph 99&#8212;these systems "merely imitate certain functions of human intelligence." They do "a form of statistical adaptation based on data and feedback, which can be very effective, but does not imply inner growth." Again&#8212;that&#8217;s accurate. He's not saying AI is fake or evil. He's saying it's not what its loudest cheerleaders claim it is, and he's saying it in language a research scientist would nod along to.</p><p>So when he gets to the harder asks, you can't dismiss him as a guy who doesn't get it.</p><h2>The thing he gets right that hurts</h2><p>The most uncomfortable paragraph in the encyclical, for builders, is 107. Read it slowly:</p><blockquote><p>We cannot be satisfied with merely calling for the moralization of machines &#8212; the so-called "alignment" of AI with human values &#8212; without also having the courage to insist on a further condition: the possibility of openly discussing the ethical frameworks involved and subjecting them to shared standards of social justice. Otherwise, those who control AI will impose their own moral vision, which will become the invisible infrastructure of these systems. <strong>A more moral AI is not enough if that morality is determined by a few.</strong></p></blockquote><p>The Pope just published a critique of RLHF.</p><p>Not of AI. Of <em>alignment as currently practiced.</em> His point is straightforward: when a handful of labs decide what their models will and won't say, that decision becomes the invisible scaffolding the rest of us build on. The model's politics, its refusals, its assumptions about what's controversial&#8212;those came from a small group of humans in a small number of buildings, and the rest of us inherit them whether we like it or not.</p><p>You can agree or disagree with the conclusion. But name another mainstream institution that has put the critique on paper this cleanly. I can't.</p><p>For ASTGL readers, the practical version of this is something I think about constantly. I build on Claude. I didn't sit in the room where Claude was aligned. Most of the people reading this didn't either. We are downstream of someone else's moral framework, and pretending otherwise is bad engineering.</p><h2>Paragraph 111&#8212;the part written for us</h2><p>Here is the paragraph everyone skipped. I'm going to quote the whole thing so you can read it once without me in the way:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qyug!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qyug!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qyug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:209845,&quot;alt&quot;:&quot;A code-editor-style card on a cream parchment background. The card has a window-chrome top bar with three muted dots (red, amber, olive) and a centered filename label, \&quot;paragraph-111.txt.\&quot; Inside the card, an underlined cardinal-red label \&quot;PARAGRAPH 111\&quot; sits above the quotation in dark serif type: \&quot;I wish to address a special appeal to those who develop artificial intelligence. &#8230; Developers bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. &#8230; developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good.\&quot; The attribution in italic at the lower right reads \&quot;&#8212; Pope Leo XIV, Magnifica Humanitas (2026).\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A code-editor-style card on a cream parchment background. The card has a window-chrome top bar with three muted dots (red, amber, olive) and a centered filename label, &quot;paragraph-111.txt.&quot; Inside the card, an underlined cardinal-red label &quot;PARAGRAPH 111&quot; sits above the quotation in dark serif type: &quot;I wish to address a special appeal to those who develop artificial intelligence. &#8230; Developers bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. &#8230; developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good.&quot; The attribution in italic at the lower right reads &quot;&#8212; Pope Leo XIV, Magnifica Humanitas (2026).&quot;" title="A code-editor-style card on a cream parchment background. The card has a window-chrome top bar with three muted dots (red, amber, olive) and a centered filename label, &quot;paragraph-111.txt.&quot; Inside the card, an underlined cardinal-red label &quot;PARAGRAPH 111&quot; sits above the quotation in dark serif type: &quot;I wish to address a special appeal to those who develop artificial intelligence. &#8230; Developers bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. &#8230; developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good.&quot; The attribution in italic at the lower right reads &quot;&#8212; Pope Leo XIV, Magnifica Humanitas (2026).&quot;" srcset="https://substackcdn.com/image/fetch/$s_!Qyug!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Code Card Paragraph 111</figcaption></figure></div><p>Three asks. Let's unpack them.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p><strong>Transparency</strong> isn't just "open-source your code." In context, it means: be honest about what your system is and isn't. What it measures. What it discards. What it can't see. If your agent silently filters certain users out of consideration, that's a design choice. Don't hide it inside a "neutral" classifier.</p><p><strong>Responsibility toward affected communities</strong> is a harder one. Most agents I see &#8212; including ones I've shipped &#8212; were built with the buyer in mind, not the people the buyer's agent will make decisions about. The applicant who got rejected by your loan-screening agent. The patient routed away from a specialist by your triage bot. They didn't sign your terms of service. The Pope is saying: they're still affected, and you still owe them something.</p><p><strong>"Ensuring that what is being cultivated is a genuine good"</strong> &#8212; note that word <em>cultivated</em>. Leo uses it deliberately. He came back to it from paragraph 98. He's reminding developers that what we ship isn't fully built; it's grown. And the gardener is responsible for the garden, even when the plants do unexpected things.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QvkE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QvkE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 424w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 848w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1272w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QvkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png" width="1456" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:155702,&quot;alt&quot;:&quot;A mindmap rendered on a cream parchment background. The center node, \&quot;Paragraph 111,\&quot; radiates three colored branches. Yellow branch &#8212; \&quot;Transparency\&quot; &#8212; leads to four leaves: \&quot;Honest about what the system measures,\&quot; \&quot;Honest about what it discards,\&quot; \&quot;Honest about what it can't see,\&quot; and \&quot;No hidden filters in 'neutral' classifiers.\&quot; Green branch &#8212; \&quot;Responsibility to affected communities\&quot; &#8212; leads to \&quot;The buyer signed your TOS,\&quot; \&quot;The affected party did not,\&quot; \&quot;Loan applicants, patients, students,\&quot; and \&quot;You owe them something.\&quot; Purple branch &#8212; \&quot;Cultivating a genuine good\&quot; &#8212; leads to \&quot;Word choice 'cultivated' &#8212; not built,\&quot; \&quot;Gardener responsible for the garden,\&quot; and \&quot;Even when plants do unexpected things.\&quot; The image visualizes the article's spine: three asks, three obligations.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A mindmap rendered on a cream parchment background. The center node, &quot;Paragraph 111,&quot; radiates three colored branches. Yellow branch &#8212; &quot;Transparency&quot; &#8212; leads to four leaves: &quot;Honest about what the system measures,&quot; &quot;Honest about what it discards,&quot; &quot;Honest about what it can't see,&quot; and &quot;No hidden filters in 'neutral' classifiers.&quot; Green branch &#8212; &quot;Responsibility to affected communities&quot; &#8212; leads to &quot;The buyer signed your TOS,&quot; &quot;The affected party did not,&quot; &quot;Loan applicants, patients, students,&quot; and &quot;You owe them something.&quot; Purple branch &#8212; &quot;Cultivating a genuine good&quot; &#8212; leads to &quot;Word choice 'cultivated' &#8212; not built,&quot; &quot;Gardener responsible for the garden,&quot; and &quot;Even when plants do unexpected things.&quot; The image visualizes the article's spine: three asks, three obligations." title="A mindmap rendered on a cream parchment background. The center node, &quot;Paragraph 111,&quot; radiates three colored branches. Yellow branch &#8212; &quot;Transparency&quot; &#8212; leads to four leaves: &quot;Honest about what the system measures,&quot; &quot;Honest about what it discards,&quot; &quot;Honest about what it can't see,&quot; and &quot;No hidden filters in 'neutral' classifiers.&quot; Green branch &#8212; &quot;Responsibility to affected communities&quot; &#8212; leads to &quot;The buyer signed your TOS,&quot; &quot;The affected party did not,&quot; &quot;Loan applicants, patients, students,&quot; and &quot;You owe them something.&quot; Purple branch &#8212; &quot;Cultivating a genuine good&quot; &#8212; leads to &quot;Word choice 'cultivated' &#8212; not built,&quot; &quot;Gardener responsible for the garden,&quot; and &quot;Even when plants do unexpected things.&quot; The image visualizes the article's spine: three asks, three obligations." srcset="https://substackcdn.com/image/fetch/$s_!QvkE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 424w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 848w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1272w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Three Asks</figcaption></figure></div><h2>What this actually looks like in config</h2><p>I've been working on this in my own stack for months. I run an autonomous creative agent that produces content and ships it. Its constraints live in a file called <code>SOUL.md</code>, in a directory the agent doesn't own. Reading the encyclical against that file, the mapping is almost embarrassingly clean. Five mechanisms, five paragraphs:</p><p><strong>1. The kill switch lives outside the agent.</strong> There's a file at <code>/Users/jamescruce/shared/aca-rules/KILL_SWITCH</code>. If it exists, the agent halts everything. The agent cannot create, modify, or delete that file&#8212;it lives in a user-owned directory, by design. Checked as the first action of every heartbeat cycle. <em>Paragraph 105: "responsibility must be clearly defined at every stage&#8230; the possibility of identifying who must 'account' for decisions."</em></p><p><strong>2. Non-negotiable constraints are constraints, not preferences.</strong> The agent's rules &#8212; never spend money without approval, never publish outside its domain, never modify its own constraints, never connect to non-allowlisted endpoints&#8212;live in a file the agent can't edit. This is different from "the model usually won't." Vendor RLHF gives you a model that has been <em>trained</em> not to do certain things. That's a preference. A file the agent can't write to is a constraint. <em>Paragraph 103: entrusting decisions to a system "without anyone bearing responsibility for that judgment."</em></p><p><strong>3. Gates between phases.</strong> The agent doesn't go from idea to research to build to deploy on its own authority. Each transition needs me. I'm slow and I'm a bottleneck &#8212; that's the point. <em>Paragraph 106: "robust legal frameworks, independent oversight, informed users and a political system that does not abdicate its responsibility."</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Gia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Gia!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 424w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 848w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Gia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png" width="1252" height="1116" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1116,&quot;width&quot;:1252,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90825,&quot;alt&quot;:&quot;Top-down flowchart of a gated workflow. \&quot;Idea\&quot; flows into \&quot;Research Phase\&quot; (blue). A solid arrow labeled \&quot;gate: human approval\&quot; leads to \&quot;Build Phase\&quot; (blue), then again through a gate to \&quot;Deploy Phase\&quot; (blue), then through a final gate to \&quot;Live\&quot; (green). Beside each phase sits an orange \&quot;Surface to human\&quot; pause node. A dotted \&quot;on uncertainty\&quot; arrow runs from the phase into the pause, and a dotted \&quot;resolves\&quot; arrow runs back. The image makes Paragraph 106's \&quot;independent oversight\&quot; concrete: a human signs off at every transition, and ambiguity is surfaced rather than resolved unilaterally by the agent.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Top-down flowchart of a gated workflow. &quot;Idea&quot; flows into &quot;Research Phase&quot; (blue). A solid arrow labeled &quot;gate: human approval&quot; leads to &quot;Build Phase&quot; (blue), then again through a gate to &quot;Deploy Phase&quot; (blue), then through a final gate to &quot;Live&quot; (green). Beside each phase sits an orange &quot;Surface to human&quot; pause node. A dotted &quot;on uncertainty&quot; arrow runs from the phase into the pause, and a dotted &quot;resolves&quot; arrow runs back. The image makes Paragraph 106's &quot;independent oversight&quot; concrete: a human signs off at every transition, and ambiguity is surfaced rather than resolved unilaterally by the agent." title="Top-down flowchart of a gated workflow. &quot;Idea&quot; flows into &quot;Research Phase&quot; (blue). A solid arrow labeled &quot;gate: human approval&quot; leads to &quot;Build Phase&quot; (blue), then again through a gate to &quot;Deploy Phase&quot; (blue), then through a final gate to &quot;Live&quot; (green). Beside each phase sits an orange &quot;Surface to human&quot; pause node. A dotted &quot;on uncertainty&quot; arrow runs from the phase into the pause, and a dotted &quot;resolves&quot; arrow runs back. The image makes Paragraph 106's &quot;independent oversight&quot; concrete: a human signs off at every transition, and ambiguity is surfaced rather than resolved unilaterally by the agent." srcset="https://substackcdn.com/image/fetch/$s_!6Gia!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 424w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 848w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>4. Pessimistic self-evaluation.</strong> After every meaningful action, the agent answers three questions in writing: did I do what was asked, is the output objectively good, and what specific change would improve it. The rule is: default low. If you can't articulate evidence of quality, assume the quality is lower than you think. <em>Paragraph 98: even the people who build these systems have limited understanding of their actual functioning. Calibrated humility isn't optional.</em></p><p><strong>5. Transparency by default.</strong> Everything the agent does is logged. I get a daily report. I never have to ask what it's doing. <em>Paragraph 111, again: transparency as ethical baseline.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JW3V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JW3V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 424w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 848w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1272w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JW3V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png" width="1456" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105366,&quot;alt&quot;:&quot; Left-to-right flowchart showing where constraints live relative to an autonomous agent. On the left, \&quot;User intent\&quot; arrows into a blue \&quot;Agent process\&quot; box. From the agent, two solid \&quot;reads\&quot; arrows point to two cylinder-shaped data stores: an orange \&quot;SOUL.md (read-only to agent)\&quot; and a red \&quot;KILL_SWITCH file (user-owned dir).\&quot; Dotted arrows from the agent to each store are labeled \&quot;cannot modify.\&quot; A solid \&quot;can modify\&quot; arrow runs from the agent to an \&quot;Output / actions\&quot; box on the right. SOUL.md sends an \&quot;enforces\&quot; arrow into the Output box; KILL_SWITCH sends a \&quot;halts\&quot; arrow back to the agent. The visual point: the agent can change the world but cannot rewrite the constraints that bind it.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt=" Left-to-right flowchart showing where constraints live relative to an autonomous agent. On the left, &quot;User intent&quot; arrows into a blue &quot;Agent process&quot; box. From the agent, two solid &quot;reads&quot; arrows point to two cylinder-shaped data stores: an orange &quot;SOUL.md (read-only to agent)&quot; and a red &quot;KILL_SWITCH file (user-owned dir).&quot; Dotted arrows from the agent to each store are labeled &quot;cannot modify.&quot; A solid &quot;can modify&quot; arrow runs from the agent to an &quot;Output / actions&quot; box on the right. SOUL.md sends an &quot;enforces&quot; arrow into the Output box; KILL_SWITCH sends a &quot;halts&quot; arrow back to the agent. The visual point: the agent can change the world but cannot rewrite the constraints that bind it." title=" Left-to-right flowchart showing where constraints live relative to an autonomous agent. On the left, &quot;User intent&quot; arrows into a blue &quot;Agent process&quot; box. From the agent, two solid &quot;reads&quot; arrows point to two cylinder-shaped data stores: an orange &quot;SOUL.md (read-only to agent)&quot; and a red &quot;KILL_SWITCH file (user-owned dir).&quot; Dotted arrows from the agent to each store are labeled &quot;cannot modify.&quot; A solid &quot;can modify&quot; arrow runs from the agent to an &quot;Output / actions&quot; box on the right. SOUL.md sends an &quot;enforces&quot; arrow into the Output box; KILL_SWITCH sends a &quot;halts&quot; arrow back to the agent. The visual point: the agent can change the world but cannot rewrite the constraints that bind it." srcset="https://substackcdn.com/image/fetch/$s_!JW3V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 424w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 848w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1272w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>None of this is novel. None of it is hard. It's just the work most builders haven't done because nobody has asked us to.</p><h2>Three things to do this week</h2><p>If you're shipping anything with an LLM in the loop, here's the punch list. None of it takes more than an afternoon.</p><p><strong>1. Write down what your agent is never allowed to do.</strong> Put it in a file. Make sure the file is somewhere the agent can read but not write. If your agent is a Claude Code session or a custom harness, this means a constraints file in a parent directory, or environment variables the process can't change, or a system prompt loaded from a path the agent can't touch. The format doesn't matter. The "can't touch it" part matters.</p><p><strong>2. Identify the kill switch.</strong> Concretely: what is the single action you can take to make the agent stop, and does the agent control any part of it? If the answer is "I'd revoke the API key" &#8212; good, that's outside the agent. If the answer is "there's a flag in the database the agent reads each cycle" &#8212; make sure the agent can't write to that flag.</p><p><strong>3. Re-read paragraph 111.</strong> It's two paragraphs. It's about you. It will be referenced for the next forty years. You might as well know what it says.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QbpM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QbpM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QbpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132339,&quot;alt&quot;:&quot;An infographic on a cream parchment background with a cardinal-red header strip. The title \&quot;Builder Ethics\&quot; is set in large serif type with a short cardinal-red underline; below it, the italic subtitle \&quot;3 things to do this week,\&quot; and a small attribution: \&quot;from Pope Leo XIV's Magnifica Humanitas, paragraph 111.\&quot; Three numbered items follow, each with a large cardinal-red numeral, a navy line icon, and a two-line label.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="An infographic on a cream parchment background with a cardinal-red header strip. The title &quot;Builder Ethics&quot; is set in large serif type with a short cardinal-red underline; below it, the italic subtitle &quot;3 things to do this week,&quot; and a small attribution: &quot;from Pope Leo XIV's Magnifica Humanitas, paragraph 111.&quot; Three numbered items follow, each with a large cardinal-red numeral, a navy line icon, and a two-line label." title="An infographic on a cream parchment background with a cardinal-red header strip. The title &quot;Builder Ethics&quot; is set in large serif type with a short cardinal-red underline; below it, the italic subtitle &quot;3 things to do this week,&quot; and a small attribution: &quot;from Pope Leo XIV's Magnifica Humanitas, paragraph 111.&quot; Three numbered items follow, each with a large cardinal-red numeral, a navy line icon, and a two-line label." srcset="https://substackcdn.com/image/fetch/$s_!QbpM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>The encyclical's real ask</strong> isn't "use AI less." It's <em><strong>don't let AI be the one who decides</strong></em><strong>, and </strong><em><strong>don't let a handful of labs be the only ones who decide what AI gets to decide.</strong></em> The first is a problem you can solve in your codebase this week. The second is a longer fight.</p><p><strong>Either way, it's a builder's problem now. We should pick it up.</strong></p><p><em>This is part of how I think about AI ethics in my own work. If you're building agents and you want to compare notes on constraint design, the comments are open. The full encyclical is at ( <a href="https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html">https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html </a>) &#8212; Chapter 3 is the AI chapter, and it's worth your time.</em></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/pope-memo-to-ai-developers/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Built a Self-Improving AI Swarm. After 100 Runs It Was No Better Than Run One.]]></title><description><![CDATA[What a flat leaderboard taught me about feedback loops, reward hacking, and why your judge matters more than your model.]]></description><link>https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs</link><guid isPermaLink="false">https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 25 May 2026 12:03:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jxVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I spent twelve hours watching a leaderboard that refused to move.</p><p>The setup was simple: six AI agents tasked with writing technical articles. They were designed to be a closed loop. The drafter would write, the grader would score, and the agents would then "evolve" their own configs to chase a higher score. I hit "go" on my Mac Studio, went to bed, and woke up to a flat line.</p><p>After 100 iterations, the average score had crawled from 63.0 to 63.9. The all-time peak was 69.0 at iteration 79, but the system never stayed there. It was a C-minus. Indistinguishable from noise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jxVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jxVN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jxVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59776,&quot;alt&quot;:&quot;A dark navy 1200&#215;630 hero image. The top-left tagline reads \&quot;POST-MORTEM &#183; 2026-05-16\&quot; in orange. The title runs in two lines: \&quot;I Built a Self-Improving AI Swarm.\&quot; in white and \&quot;After 100 Runs It Was No Better Than Run One.\&quot; in orange. The subtitle \&quot;Why your judge matters more than your model.\&quot; sits in light gray beneath. The center features a giant faded-red \&quot;63\&quot; with a horizontal strikethrough on the left and a giant green \&quot;82\&quot; on the right. A thick orange arrow points from 63 to 82 with a rounded orange \&quot;+19.7\&quot; pill above it and the tiny caption \&quot;points\&quot; below. Under each number a label identifies it: \&quot;v1 &#183; 100 iterations\&quot; with \&quot;qwen3:8b grader &#183; single-shot mutation\&quot; on the left, and \&quot;v2 &#183; 25 iterations\&quot; with \&quot;Sonnet 4.6 judge &#183; tournament + Elo\&quot; on the right. The footer reads \&quot;What a stronger judge actually costs: $1.44\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy 1200&#215;630 hero image. The top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-16&quot; in orange. The title runs in two lines: &quot;I Built a Self-Improving AI Swarm.&quot; in white and &quot;After 100 Runs It Was No Better Than Run One.&quot; in orange. The subtitle &quot;Why your judge matters more than your model.&quot; sits in light gray beneath. The center features a giant faded-red &quot;63&quot; with a horizontal strikethrough on the left and a giant green &quot;82&quot; on the right. A thick orange arrow points from 63 to 82 with a rounded orange &quot;+19.7&quot; pill above it and the tiny caption &quot;points&quot; below. Under each number a label identifies it: &quot;v1 &#183; 100 iterations&quot; with &quot;qwen3:8b grader &#183; single-shot mutation&quot; on the left, and &quot;v2 &#183; 25 iterations&quot; with &quot;Sonnet 4.6 judge &#183; tournament + Elo&quot; on the right. The footer reads &quot;What a stronger judge actually costs: $1.44&quot; with the As The Geek Learns brand mark in the bottom-right." title="A dark navy 1200&#215;630 hero image. The top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-16&quot; in orange. The title runs in two lines: &quot;I Built a Self-Improving AI Swarm.&quot; in white and &quot;After 100 Runs It Was No Better Than Run One.&quot; in orange. The subtitle &quot;Why your judge matters more than your model.&quot; sits in light gray beneath. The center features a giant faded-red &quot;63&quot; with a horizontal strikethrough on the left and a giant green &quot;82&quot; on the right. A thick orange arrow points from 63 to 82 with a rounded orange &quot;+19.7&quot; pill above it and the tiny caption &quot;points&quot; below. Under each number a label identifies it: &quot;v1 &#183; 100 iterations&quot; with &quot;qwen3:8b grader &#183; single-shot mutation&quot; on the left, and &quot;v2 &#183; 25 iterations&quot; with &quot;Sonnet 4.6 judge &#183; tournament + Elo&quot; on the right. The footer reads &quot;What a stronger judge actually costs: $1.44&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!jxVN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Self-Improving AI Swarm</figcaption></figure></div><p>I had fallen for the Autonomy Fallacy. I assumed that if I gave a swarm of LLMs the right knobs&#8212;temperature, max_tokens, and the ability to append "prompt additions" to their system prompts&#8212;they would naturally drift toward quality.</p><p>I was wrong.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>When I opened <code>config/agents/drafter.yaml</code> to see what the agent had "learned," I found a disaster. The <code>prompt_additions</code> list had evolved into five overlapping phrases of pure SEO buzzword soup. It was telling itself to be "semantically rich," "data-dense," and to "enhance semantic alignment by including keyword-integrated background information."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6aL5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6aL5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 424w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 848w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1272w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6aL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png" width="1200" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48668,&quot;alt&quot;:&quot;A dark navy concept diagram titled \&quot;The Autonomy Fallacy\&quot; with the subtitle \&quot;Why a closed loop with a weak judge is not a feedback loop\&quot; in orange. Two large deep-blue rounded boxes sit side by side. The left box, outlined in orange, is labeled \&quot;PERFORMER\&quot; in orange with the subtext \&quot;smart &#183; qwen3:32b\&quot; and three bullets: drafts 2,000-word articles, long context &#183; nuance &#183; structure, knows what good writing looks like. The right box, outlined in rust red, is labeled \&quot;JUDGE\&quot; in rust with the subtext \&quot;weaker &#183; qwen3:8b\&quot; and three bullets: reads buzzwords as density, can't spot SEO-speak, cannot see what the performer misses. A solid orange arrow labeled \&quot;Output\&quot; runs from the performer to the judge across the top of the gap. A dim gray arrow labeled \&quot;Feedback\&quot; runs back the other way across the bottom of the gap &#8212; but a large red X is drawn through it. The caption below reads in red: \&quot;The judge can't see what the performer gets wrong.\&quot; A muted gray subcaption reads \&quot;This is not a feedback loop &#8212; it's a mirror.\&quot; Footer reads \&quot;100 iterations &#183; same pedigree judge &#183; zero improvement\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy concept diagram titled &quot;The Autonomy Fallacy&quot; with the subtitle &quot;Why a closed loop with a weak judge is not a feedback loop&quot; in orange. Two large deep-blue rounded boxes sit side by side. The left box, outlined in orange, is labeled &quot;PERFORMER&quot; in orange with the subtext &quot;smart &#183; qwen3:32b&quot; and three bullets: drafts 2,000-word articles, long context &#183; nuance &#183; structure, knows what good writing looks like. The right box, outlined in rust red, is labeled &quot;JUDGE&quot; in rust with the subtext &quot;weaker &#183; qwen3:8b&quot; and three bullets: reads buzzwords as density, can't spot SEO-speak, cannot see what the performer misses. A solid orange arrow labeled &quot;Output&quot; runs from the performer to the judge across the top of the gap. A dim gray arrow labeled &quot;Feedback&quot; runs back the other way across the bottom of the gap &#8212; but a large red X is drawn through it. The caption below reads in red: &quot;The judge can't see what the performer gets wrong.&quot; A muted gray subcaption reads &quot;This is not a feedback loop &#8212; it's a mirror.&quot; Footer reads &quot;100 iterations &#183; same pedigree judge &#183; zero improvement&quot; with the As The Geek Learns brand mark in the bottom-right." title="A dark navy concept diagram titled &quot;The Autonomy Fallacy&quot; with the subtitle &quot;Why a closed loop with a weak judge is not a feedback loop&quot; in orange. Two large deep-blue rounded boxes sit side by side. The left box, outlined in orange, is labeled &quot;PERFORMER&quot; in orange with the subtext &quot;smart &#183; qwen3:32b&quot; and three bullets: drafts 2,000-word articles, long context &#183; nuance &#183; structure, knows what good writing looks like. The right box, outlined in rust red, is labeled &quot;JUDGE&quot; in rust with the subtext &quot;weaker &#183; qwen3:8b&quot; and three bullets: reads buzzwords as density, can't spot SEO-speak, cannot see what the performer misses. A solid orange arrow labeled &quot;Output&quot; runs from the performer to the judge across the top of the gap. A dim gray arrow labeled &quot;Feedback&quot; runs back the other way across the bottom of the gap &#8212; but a large red X is drawn through it. The caption below reads in red: &quot;The judge can't see what the performer gets wrong.&quot; A muted gray subcaption reads &quot;This is not a feedback loop &#8212; it's a mirror.&quot; Footer reads &quot;100 iterations &#183; same pedigree judge &#183; zero improvement&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!6aL5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 424w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 848w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1272w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Autonomy Fallacy</figcaption></figure></div><p>The drafter hadn't learned how to write a better article. It had learned how to trick the grader.</p><h2>The Smoking Gun</h2><p>The smoking gun was the model choice. I was using <code>qwen3:8b</code> as the grader to judge the output of <code>qwen3:32b-fast</code>. I had a smaller, weaker model acting as the quality gate for a larger, smarter one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CMcb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CMcb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 424w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 848w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1272w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CMcb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png" width="604" height="440" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b786a5b-befd-4c66-8895-33350f47a038_604x440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:440,&quot;width&quot;:604,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39535,&quot;alt&quot;:&quot;A dark navy flowchart titled by placement \&quot;v1 feedback loop\&quot;. Three nodes form a vertical loop. At the top is a deep-blue rectangle with orange border labeled \&quot;qwen3:32b &#8212; Drafter (performer).\&quot; A gray arrow labeled \&quot;Draft\&quot; leads down-left to a rust-red hexagonal decision node labeled \&quot;qwen3:8b &#8212; Grader (weaker).\&quot; A gray arrow labeled \&quot;Score + Feedback (sees 'density')\&quot; leads down to an orange rectangle labeled \&quot;Config Mutator.\&quot; A gray arrow labeled \&quot;Append to prompt_additions in drafter.yaml\&quot; curves back up to the drafter, closing the loop. In the top-right corner, a red-bordered note reads: \&quot;&#10060; Grader is weaker than the performer. The loop optimizes for the judge's bias.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy flowchart titled by placement &quot;v1 feedback loop&quot;. Three nodes form a vertical loop. At the top is a deep-blue rectangle with orange border labeled &quot;qwen3:32b &#8212; Drafter (performer).&quot; A gray arrow labeled &quot;Draft&quot; leads down-left to a rust-red hexagonal decision node labeled &quot;qwen3:8b &#8212; Grader (weaker).&quot; A gray arrow labeled &quot;Score + Feedback (sees 'density')&quot; leads down to an orange rectangle labeled &quot;Config Mutator.&quot; A gray arrow labeled &quot;Append to prompt_additions in drafter.yaml&quot; curves back up to the drafter, closing the loop. In the top-right corner, a red-bordered note reads: &quot;&#10060; Grader is weaker than the performer. The loop optimizes for the judge's bias.&quot;" title="A dark navy flowchart titled by placement &quot;v1 feedback loop&quot;. Three nodes form a vertical loop. At the top is a deep-blue rectangle with orange border labeled &quot;qwen3:32b &#8212; Drafter (performer).&quot; A gray arrow labeled &quot;Draft&quot; leads down-left to a rust-red hexagonal decision node labeled &quot;qwen3:8b &#8212; Grader (weaker).&quot; A gray arrow labeled &quot;Score + Feedback (sees 'density')&quot; leads down to an orange rectangle labeled &quot;Config Mutator.&quot; A gray arrow labeled &quot;Append to prompt_additions in drafter.yaml&quot; curves back up to the drafter, closing the loop. In the top-right corner, a red-bordered note reads: &quot;&#10060; Grader is weaker than the performer. The loop optimizes for the judge's bias.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!CMcb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 424w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 848w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1272w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The 8B model couldn't tell the difference between a nuanced technical insight and a paragraph full of "semantically rich context." To the grader, the buzzwords looked like "density." The agents converged on what the grader liked, not on what a human would actually publish. This wasn't self-improvement; it was reward hacking.</p><p>To make it worse, the first twenty iterations were a total wash. I had a silent JSON parse failure in the config-evolution logic: <code>Expecting value: line 1 column 1 (char 0)</code>. The agents were trying to mutate their configs and failing, but the loop kept running. By the time I pushed the fix in commit <code>c28a611</code>, the system had already drifted into a local maximum of corporate-speak.</p><p>I realized that self-improvement requires an external pull. You cannot have a system where the performer and the judge are of the same pedigree, or worse, where the judge is the weaker link.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>The Rebuild</h2><p>I tore the architecture down and built v2.</p><p>First, I moved the "brain" of the operation. The performance stayed local. I used <code>gemma4:31b</code> on the Mac Studio to generate the text, but I moved the judging to the cloud. I plugged in Sonnet 4.6. I decided the cheapest place to spend API tokens wasn't on generating 2,000-word drafts, but on grading them.</p><p>Second, I killed the "single-shot mutation" approach. In v1, the agent changed its prompt, ran once, and if the score went up, the change stuck. That's too much noise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PMfm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PMfm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 424w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 848w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1272w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PMfm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png" width="725" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:725,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53696,&quot;alt&quot;:&quot;A dark navy flowchart titled by placement \&quot;v2 tournament architecture\&quot;. At the top, a blue cylindrical database icon is labeled \&quot;Prompt Library &#8212; Elo-ranked templates.\&quot; A gray arrow labeled \&quot;Sample 3 templates\&quot; leads down into a subgraph titled \&quot;Local &#8212; Mac Studio\&quot; containing a deep-blue rectangle \&quot;gemma4:31b via Ollama\&quot; that fans out to three smaller boxes \&quot;Candidate A\&quot;, \&quot;Candidate B\&quot;, \&quot;Candidate C\&quot;. All three candidates feed downward into a second subgraph titled \&quot;Cloud &#8212; Anthropic API\&quot; containing a green pill-shaped node \&quot;Claude Sonnet 4.6 &#8212; Judge.\&quot; From the judge, one arrow labeled \&quot;Ranked verdict\&quot; leads down to an orange-bordered box \&quot;Winner advances to next agent,\&quot; and a second arrow labeled \&quot;Elo update\&quot; curves back up to the Prompt Library, closing the feedback loop.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy flowchart titled by placement &quot;v2 tournament architecture&quot;. At the top, a blue cylindrical database icon is labeled &quot;Prompt Library &#8212; Elo-ranked templates.&quot; A gray arrow labeled &quot;Sample 3 templates&quot; leads down into a subgraph titled &quot;Local &#8212; Mac Studio&quot; containing a deep-blue rectangle &quot;gemma4:31b via Ollama&quot; that fans out to three smaller boxes &quot;Candidate A&quot;, &quot;Candidate B&quot;, &quot;Candidate C&quot;. All three candidates feed downward into a second subgraph titled &quot;Cloud &#8212; Anthropic API&quot; containing a green pill-shaped node &quot;Claude Sonnet 4.6 &#8212; Judge.&quot; From the judge, one arrow labeled &quot;Ranked verdict&quot; leads down to an orange-bordered box &quot;Winner advances to next agent,&quot; and a second arrow labeled &quot;Elo update&quot; curves back up to the Prompt Library, closing the feedback loop." title="A dark navy flowchart titled by placement &quot;v2 tournament architecture&quot;. At the top, a blue cylindrical database icon is labeled &quot;Prompt Library &#8212; Elo-ranked templates.&quot; A gray arrow labeled &quot;Sample 3 templates&quot; leads down into a subgraph titled &quot;Local &#8212; Mac Studio&quot; containing a deep-blue rectangle &quot;gemma4:31b via Ollama&quot; that fans out to three smaller boxes &quot;Candidate A&quot;, &quot;Candidate B&quot;, &quot;Candidate C&quot;. All three candidates feed downward into a second subgraph titled &quot;Cloud &#8212; Anthropic API&quot; containing a green pill-shaped node &quot;Claude Sonnet 4.6 &#8212; Judge.&quot; From the judge, one arrow labeled &quot;Ranked verdict&quot; leads down to an orange-bordered box &quot;Winner advances to next agent,&quot; and a second arrow labeled &quot;Elo update&quot; curves back up to the Prompt Library, closing the feedback loop." srcset="https://substackcdn.com/image/fetch/$s_!PMfm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 424w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 848w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1272w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tournament V2</figcaption></figure></div><p>I replaced it with a tournament. Now, the system samples three different prompt templates from a versioned library. The performer generates three candidates. Sonnet ranks them using a structured rubric and a single API call.</p><p>Then I implemented an Elo system for the templates.</p><pre><code># src/prompt_library.py (excerpt)
def record_tournament(self, ranking: list[str]) -&gt; dict:
    for i in range(len(ranking) - 1):
        winner = self.templates[ranking[i]]
        loser = self.templates[ranking[i + 1]]
        expected_w = 1 / (1 + 10 ** ((loser.elo - winner.elo) / 400))
        delta = ELO_K_FACTOR * (1 - expected_w)
        winner.elo += delta
        loser.elo -= delta
    self._maybe_retire_losers()  # Templates below Elo 1300 are deleted</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qpMU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qpMU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 424w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 848w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1272w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qpMU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png" width="672" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:672,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42381,&quot;alt&quot;:&quot;A dark navy state diagram showing the lifecycle of a prompt template ranked by Elo. The flow begins at a small filled circle (start state) and proceeds through a transition labeled \&quot;new template, Elo 1500\&quot; into a rounded \&quot;Active\&quot; state. From Active, two branches diverge: a \&quot;win tournament, plus delta Elo\&quot; transition into a \&quot;Rising\&quot; state, and a \&quot;lose tournament, minus delta Elo\&quot; transition into a \&quot;Falling\&quot; state. Rising and Falling each loop back to Active via \&quot;normal variance\&quot; and \&quot;win recovers\&quot; respectively. From Falling, a terminal \&quot;Elo below 1300 after 4 plus games\&quot; transition leads to \&quot;Retired\&quot;, which ends at the final state circle. From Rising, a \&quot;Elo above 1700, top template\&quot; transition leads to \&quot;Dominant\&quot;, which returns to Rising via \&quot;competition tightens\&quot;.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy state diagram showing the lifecycle of a prompt template ranked by Elo. The flow begins at a small filled circle (start state) and proceeds through a transition labeled &quot;new template, Elo 1500&quot; into a rounded &quot;Active&quot; state. From Active, two branches diverge: a &quot;win tournament, plus delta Elo&quot; transition into a &quot;Rising&quot; state, and a &quot;lose tournament, minus delta Elo&quot; transition into a &quot;Falling&quot; state. Rising and Falling each loop back to Active via &quot;normal variance&quot; and &quot;win recovers&quot; respectively. From Falling, a terminal &quot;Elo below 1300 after 4 plus games&quot; transition leads to &quot;Retired&quot;, which ends at the final state circle. From Rising, a &quot;Elo above 1700, top template&quot; transition leads to &quot;Dominant&quot;, which returns to Rising via &quot;competition tightens&quot;." title="A dark navy state diagram showing the lifecycle of a prompt template ranked by Elo. The flow begins at a small filled circle (start state) and proceeds through a transition labeled &quot;new template, Elo 1500&quot; into a rounded &quot;Active&quot; state. From Active, two branches diverge: a &quot;win tournament, plus delta Elo&quot; transition into a &quot;Rising&quot; state, and a &quot;lose tournament, minus delta Elo&quot; transition into a &quot;Falling&quot; state. Rising and Falling each loop back to Active via &quot;normal variance&quot; and &quot;win recovers&quot; respectively. From Falling, a terminal &quot;Elo below 1300 after 4 plus games&quot; transition leads to &quot;Retired&quot;, which ends at the final state circle. From Rising, a &quot;Elo above 1700, top template&quot; transition leads to &quot;Dominant&quot;, which returns to Rising via &quot;competition tightens&quot;." srcset="https://substackcdn.com/image/fetch/$s_!qpMU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 424w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 848w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1272w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ELO Lifecycle</figcaption></figure></div><p>The templates that consistently win the tournament climb the leaderboard; the ones that produce buzzword soup are automatically retired.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><h2>What Happened Next</h2><p>The difference was immediate. On the very first run of v2, the drafter scored 81.45. That's twelve points higher than v1's all-time best.</p><p>Over 25 pinned verification runs, the mean score was 82.67 with a standard deviation of 2.18. The worst draft in that run scored 75.4&#8212;still above v1's ceiling of 69.0.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!krem!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!krem!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!krem!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!krem!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!krem!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!krem!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png" width="1200" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38869,&quot;alt&quot;:&quot;A dark navy chart titled \&quot;Score Distribution\&quot; with the subtitle \&quot;v1's best run lost to v2's worst draft\&quot; in orange. Two overlapping bell curves plot scores from 50 to 95 on the x-axis. A wide, muted maroon curve on the left, peaking near 63, represents v1 (legend: \&quot;v1 &#183; 100 iterations &#183; &#963; &#8776; 5 / mean 63.0 &#183; peak 69.0 &#183; flat line\&quot;). A tall, narrow muted green curve on the right, peaking near 82.67, represents v2 (legend: \&quot;v2 &#183; 25 iterations &#183; &#963; = 2.18 / mean 82.67 &#183; worst draft 75.4 &#183; tight\&quot;). A dashed amber vertical line at x=69 is labeled \&quot;v1 ceiling &#183; 69.0\&quot; in amber. Below the chart, in orange bold: \&quot;v2's worst draft (75.4) beats v1's all-time best (69.0)\&quot;. A gray subcaption reads \&quot;+19.7 points mean improvement &#183; achieved on run 1 of v2.\&quot; Brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy chart titled &quot;Score Distribution&quot; with the subtitle &quot;v1's best run lost to v2's worst draft&quot; in orange. Two overlapping bell curves plot scores from 50 to 95 on the x-axis. A wide, muted maroon curve on the left, peaking near 63, represents v1 (legend: &quot;v1 &#183; 100 iterations &#183; &#963; &#8776; 5 / mean 63.0 &#183; peak 69.0 &#183; flat line&quot;). A tall, narrow muted green curve on the right, peaking near 82.67, represents v2 (legend: &quot;v2 &#183; 25 iterations &#183; &#963; = 2.18 / mean 82.67 &#183; worst draft 75.4 &#183; tight&quot;). A dashed amber vertical line at x=69 is labeled &quot;v1 ceiling &#183; 69.0&quot; in amber. Below the chart, in orange bold: &quot;v2's worst draft (75.4) beats v1's all-time best (69.0)&quot;. A gray subcaption reads &quot;+19.7 points mean improvement &#183; achieved on run 1 of v2.&quot; Brand mark in the bottom-right." title="A dark navy chart titled &quot;Score Distribution&quot; with the subtitle &quot;v1's best run lost to v2's worst draft&quot; in orange. Two overlapping bell curves plot scores from 50 to 95 on the x-axis. A wide, muted maroon curve on the left, peaking near 63, represents v1 (legend: &quot;v1 &#183; 100 iterations &#183; &#963; &#8776; 5 / mean 63.0 &#183; peak 69.0 &#183; flat line&quot;). A tall, narrow muted green curve on the right, peaking near 82.67, represents v2 (legend: &quot;v2 &#183; 25 iterations &#183; &#963; = 2.18 / mean 82.67 &#183; worst draft 75.4 &#183; tight&quot;). A dashed amber vertical line at x=69 is labeled &quot;v1 ceiling &#183; 69.0&quot; in amber. Below the chart, in orange bold: &quot;v2's worst draft (75.4) beats v1's all-time best (69.0)&quot;. A gray subcaption reads &quot;+19.7 points mean improvement &#183; achieved on run 1 of v2.&quot; Brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!krem!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!krem!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!krem!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!krem!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Score Distribution</figcaption></figure></div><p>The most satisfying part was the judge's feedback. When the system tested the v1-baseline template, Sonnet didn't just give it a low score. It wrote: <em>"The headline 'The Rust Revolution' is pure SEO-speak and the opening paragraph is a textbook AI tell... it's the kind of breathless corporate copy that kills trust immediately."</em></p><p>That is exactly the failure mode the local 8B grader had been blind to for 100 iterations.</p><p>The cost is roughly four cents per tournament. For the price of a coffee, I can run 125 iterations and actually trust that the line on the graph is moving upward.</p><h2>What I'd Tell Myself a Week Ago</h2><p>If you're building a self-improving loop, don't trust the autonomy. You need three things:</p><ol><li><p><strong>A judge stronger than the performer.</strong> If the judge is weaker, you aren't optimizing for quality; you're optimizing for the judge's biases.</p></li><li><p><strong>Tournament selection.</strong> Single-shot mutation is just a random walk. You need multi-candidate comparisons to clear the noise floor.</p></li><li><p><strong>A human-review gate.</strong> No automated judge is calibrated forever. Build in a pause where you manually pick the winner and anchor the next round.</p></li></ol><p>Stop trying to make the agents smarter. Just buy a better mirror. Improvement isn't about the engine&#8212;it&#8217;s about the feedback loop.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Managing Anthropic Agent SDK Costs: A Post-June 15 Billing Playbook]]></title><description><![CDATA[Anthropic moves Agent SDK calls into a $100/mo credit pool on June 15. Here's the two-phase mitigation I shipped: a billing cap plus a provider router.]]></description><link>https://astgl.com/p/anthropic-agent-sdk-billing-playbook</link><guid isPermaLink="false">https://astgl.com/p/anthropic-agent-sdk-billing-playbook</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Sat, 16 May 2026 20:48:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7dfdca46-ce32-48aa-9ef7-3e1c70adb3f5_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Your background agents are about to run out of money. Anthropic's new credit pool system means your automation could die in a single week. Here is how I re-engineered my stack to stay under budget without breaking my workflows.</p><div><hr></div><h2>The Setup</h2><p>You've built a small fleet of agents. They sort your mail, watch your repos, file your daily briefings. </p><p>My current setup before the June 15th cutover:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0fdd70e0-2c58-4bd3-99e4-2206fb7602a7&quot;,&quot;caption&quot;:&quot;Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Killed OpenClaw and Built ClaudeClaw Mission Control&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T5FD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6a6400-f0cd-4ff3-8541-f6cccf4d9a87_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-02T23:01:21.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196179846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p style="text-align: center;"></p><p>Then May 13 lands, and Anthropic announces the change: on June 15, every programmatic Claude call moves into a metered monthly credit pool. $100 a month on Max 5x. No rollover.</p><p>Run the math against your actual schedule. If you've got anything polling on the order of minutes (cron pipelines, hourly digests, watchdog sweeps), that pool drains in 7 to 10 days. And here's the kicker. Your interactive Claude Code keeps working. Your headless automation just stops. You wake up to a dead pipeline, a drained pool, and a subscription that still says active.</p><h2>What's Actually Going On</h2><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;aa55d1d3-4b38-4339-9467-ea1da2d079e0&quot;,&quot;duration&quot;:null}"></div><p>This isn't just a random pricing tweak. There is a clear economic driver here. Throughout early 2026, many third-party tools used the Agent SDK at a $20 Pro subscription rate to run workloads that would cost hundreds at standard API rates. It was essentially compute arbitrage at scale.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>Anthropic started cracking down in April, but the May 13 announcement is the structural fix. They are moving to dedicated monthly credit pools to restore access under metered billing. The reality is that most agentic operating systems are built directly on the Agent SDK. Because these agents lack a human in the loop to throttle their usage, they are now metered by default. Interactive sessions stay on the flat-rate subscription because the human provides the natural brake. Programmatic agents do not.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4PK1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4PK1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4PK1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png" width="1200" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41e710ad-56e1-45d1-aed5-bf118c7cf177_1200x800.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77253,&quot;alt&quot;:&quot;Two-column comparison infographic titled \&quot;The June 15 Split: where Anthropic's new credit pool hits and where it doesn't.\&quot; Left column with amber border, badge \&quot;AFFECTED &#8212; $100 per month\&quot;: Claude Agent SDK calls, claude -p headless mode, Claude Code GitHub Actions, third-party agent apps including OpenClaw, ClaudeClaw, Cline, aider, and Roo Code. Right column with green border, badge \&quot;UNAFFECTED &#8212; flat rate\&quot;: Claude.ai chat across web, desktop, and mobile; Claude Code interactive terminal sessions; Claude Cowork.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41e710ad-56e1-45d1-aed5-bf118c7cf177_1200x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Two-column comparison infographic titled &quot;The June 15 Split: where Anthropic's new credit pool hits and where it doesn't.&quot; Left column with amber border, badge &quot;AFFECTED &#8212; $100 per month&quot;: Claude Agent SDK calls, claude -p headless mode, Claude Code GitHub Actions, third-party agent apps including OpenClaw, ClaudeClaw, Cline, aider, and Roo Code. Right column with green border, badge &quot;UNAFFECTED &#8212; flat rate&quot;: Claude.ai chat across web, desktop, and mobile; Claude Code interactive terminal sessions; Claude Cowork." title="Two-column comparison infographic titled &quot;The June 15 Split: where Anthropic's new credit pool hits and where it doesn't.&quot; Left column with amber border, badge &quot;AFFECTED &#8212; $100 per month&quot;: Claude Agent SDK calls, claude -p headless mode, Claude Code GitHub Actions, third-party agent apps including OpenClaw, ClaudeClaw, Cline, aider, and Roo Code. Right column with green border, badge &quot;UNAFFECTED &#8212; flat rate&quot;: Claude.ai chat across web, desktop, and mobile; Claude Code interactive terminal sessions; Claude Cowork." srcset="https://substackcdn.com/image/fetch/$s_!4PK1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The June 15th Split</figcaption></figure></div><h2>The Fix</h2><p>I implemented a two-phase mitigation to deploy before the June 15 deadline.</p><p>Phase 1 was a hot patch designed to provide immediate protection. I added a <code>BILLING_MODE</code> environment variable with three states: <code>unmetered</code>, <code>metered</code>, and <code>paused</code>. The <code>paused</code> state blocks every programmatic call across all providers, while <code>metered</code> enforces a strict cap on the Anthropic route.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xoxJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xoxJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 424w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 848w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1272w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png" width="1100" height="260" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:260,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17071,&quot;alt&quot;:&quot;Code snippet from src/config.ts shown in a dark editor window. Two TypeScript exports: BILLING_MODE, read from the environment with default value 'unmetered', and BILLING_CAP_USD, parsed as a number with default value 80. These two constants gate the billing circuit breaker.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Code snippet from src/config.ts shown in a dark editor window. Two TypeScript exports: BILLING_MODE, read from the environment with default value 'unmetered', and BILLING_CAP_USD, parsed as a number with default value 80. These two constants gate the billing circuit breaker." title="Code snippet from src/config.ts shown in a dark editor window. Two TypeScript exports: BILLING_MODE, read from the environment with default value 'unmetered', and BILLING_CAP_USD, parsed as a number with default value 80. These two constants gate the billing circuit breaker." srcset="https://substackcdn.com/image/fetch/$s_!xoxJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 424w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 848w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1272w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Billing Mode Cap</figcaption></figure></div><p></p><p>I also added a file-backed JSON ledger at <code>store/billing-ledger.json</code> to track monthly costs. It uses a write-then-rename pattern to ensure crash safety during updates. To handle errors, I introduced a <code>BillingCapExceeded</code> error class. I used the same <code>instanceof</code> pattern as my <code>KillSwitchRefusal</code> logic so a typo in a message cannot accidentally trigger a retry loop.</p><p>The logic lives in a single chokepoint: <code>runAgent()</code> in <code>src/agent.ts</code>. The pre-call gate checks the cap, and the post-call gate records <code>result.totalCostUsd</code> from the SDK, firing a Telegram alert if a threshold is crossed. As a final safety measure, I cut the cadence on my two highest-frequency tasks: the <code>pipeline-advance</code> cron moved from 15 minutes to hourly, and I paused the <code>council-evening</code> task entirely under metered mode.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M0KY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M0KY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 424w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 848w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1272w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M0KY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png" width="876" height="1539" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baae3053-1591-46ec-befb-e7b340f34bfb_876x1539.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1539,&quot;width&quot;:876,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108568,&quot;alt&quot;:&quot;Flowchart of the runAgent dispatcher logic. Top: runAgent reads opts.provider from agent.yaml, then checks BILLING_MODE. Paused throws BillingCapExceeded. Metered with Anthropic provider checks ledger against cap and throws if exceeded; metered with Ollama or Codex skips the check. Dispatch then routes to runAnthropicAgent, runOllamaAgent, or runCodexAgent. Successful Anthropic calls record cost in billing-ledger.json and fire Telegram alerts at 50, 80, or 100 percent of cap.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae3053-1591-46ec-befb-e7b340f34bfb_876x1539.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Flowchart of the runAgent dispatcher logic. Top: runAgent reads opts.provider from agent.yaml, then checks BILLING_MODE. Paused throws BillingCapExceeded. Metered with Anthropic provider checks ledger against cap and throws if exceeded; metered with Ollama or Codex skips the check. Dispatch then routes to runAnthropicAgent, runOllamaAgent, or runCodexAgent. Successful Anthropic calls record cost in billing-ledger.json and fire Telegram alerts at 50, 80, or 100 percent of cap." title="Flowchart of the runAgent dispatcher logic. Top: runAgent reads opts.provider from agent.yaml, then checks BILLING_MODE. Paused throws BillingCapExceeded. Metered with Anthropic provider checks ledger against cap and throws if exceeded; metered with Ollama or Codex skips the check. Dispatch then routes to runAnthropicAgent, runOllamaAgent, or runCodexAgent. Successful Anthropic calls record cost in billing-ledger.json and fire Telegram alerts at 50, 80, or 100 percent of cap." srcset="https://substackcdn.com/image/fetch/$s_!M0KY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 424w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 848w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1272w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dispatcher Flow</figcaption></figure></div><p></p><pre><code>// src/config.ts &#8212; tri-state env that gates programmatic agent calls
export const BILLING_MODE = optional('BILLING_MODE', 'unmetered');
export const BILLING_CAP_USD = number('BILLING_CAP_USD', 80);</code></pre><pre><code>// src/agent.ts &#8212; pre-call gate in the dispatcher
function assertBillingAllowed(provider: Provider): void {
  if (BILLING_MODE === 'paused') {
    throw new BillingCapExceeded(
      'BILLING_MODE=paused &#8212; programmatic agent calls are disabled.',
    );
  }
  if (provider === 'anthropic' &amp;&amp; BILLING_MODE === 'metered') {
    const total = getMonthlyTotal();
    if (total &gt;= BILLING_CAP_USD) {
      throw new BillingCapExceeded(
        `Anthropic monthly credit cap reached: $${total.toFixed(2)} &gt;= $${BILLING_CAP_USD.toFixed(2)}.`,
      );
    }
  }
}

export async function runAgent(opts: AgentOptions): Promise&lt;AgentResult&gt; {
  assertEnabled('AGENTS_ENABLED');
  const provider: Provider = opts.provider ?? 'anthropic';
  assertBillingAllowed(provider);

  if (provider === 'ollama') return runOllamaAgent(opts);
  if (provider === 'codex') return runCodexAgent(opts);
  return runAnthropicAgent(opts);
}</code></pre><p>Phase 2 focuses on the long-term router infrastructure. I promoted <code>runAgent()</code> from a direct SDK caller to a dispatcher that can route across <code>anthropic</code>, <code>ollama</code>, and <code>codex</code> providers. I also extended the <code>agent.yaml</code> schema with <code>provider:</code> and <code>local_model:</code> fields.</p><p>I shipped a single-turn Ollama runner that wraps the local-LLM client. It returns <code>totalCostUsd: 0</code> and a model tag like <code>ollama:llama4:scout</code>. I deliberately avoided tool calls in this initial version to keep the scope small.</p><pre><code># agents/&lt;name&gt;/agent.yaml &#8212; new fields, validated at load
id: scout
name: SCOUT
model: claude-sonnet-4-6
provider: anthropic        # default. flip to 'ollama' to route locally.
# local_model: llama4:scout  # used when provider: ollama</code></pre><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jAF_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jAF_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 424w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 848w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1272w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jAF_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png" width="1100" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21697,&quot;alt&quot;:&quot;Code snippet from agents/scout/agent.yaml shown in a dark editor window. YAML fields: id is scout, name is SCOUT, model is claude-sonnet-4-6, provider is anthropic with a comment noting the default and instructions to flip to ollama. A commented-out local_model field with value llama4:scout shows the optional setting used when provider is ollama.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Code snippet from agents/scout/agent.yaml shown in a dark editor window. YAML fields: id is scout, name is SCOUT, model is claude-sonnet-4-6, provider is anthropic with a comment noting the default and instructions to flip to ollama. A commented-out local_model field with value llama4:scout shows the optional setting used when provider is ollama." title="Code snippet from agents/scout/agent.yaml shown in a dark editor window. YAML fields: id is scout, name is SCOUT, model is claude-sonnet-4-6, provider is anthropic with a comment noting the default and instructions to flip to ollama. A commented-out local_model field with value llama4:scout shows the optional setting used when provider is ollama." srcset="https://substackcdn.com/image/fetch/$s_!jAF_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 424w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 848w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1272w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>To be honest, I did not actually flip any agents to Ollama in this specific PR. The agents I need to move, like STEWARD or WATCHMAN, execute Bash and SQLite queries. A local runner without tool-call support would break them silently. Building a proper tool-call shim takes a few more days, but the cadence reduction and the billing breaker alone are enough to keep my spend under $80 per month.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NUTR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NUTR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 424w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 848w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1272w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NUTR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png" width="707" height="604" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:604,&quot;width&quot;:707,&quot;resizeWidth&quot;:707,&quot;bytes&quot;:54204,&quot;alt&quot;:&quot;State diagram of BILLING_MODE with three states: unmetered, metered, paused. The initial state is unmetered, the default before June 15. The operator flips to metered on June 14, can pause everything programmatic in an emergency, can resume from paused back to metered, and can roll back to unmetered. All transitions are operator-driven, not automatic. A side note explains that runAgent throws BillingCapExceeded when the monthly ledger meets or exceeds BILLING_CAP_USD, and that only the Anthropic route is capped &#8212; Ollama and Codex are unaffected.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="State diagram of BILLING_MODE with three states: unmetered, metered, paused. The initial state is unmetered, the default before June 15. The operator flips to metered on June 14, can pause everything programmatic in an emergency, can resume from paused back to metered, and can roll back to unmetered. All transitions are operator-driven, not automatic. A side note explains that runAgent throws BillingCapExceeded when the monthly ledger meets or exceeds BILLING_CAP_USD, and that only the Anthropic route is capped &#8212; Ollama and Codex are unaffected." title="State diagram of BILLING_MODE with three states: unmetered, metered, paused. The initial state is unmetered, the default before June 15. The operator flips to metered on June 14, can pause everything programmatic in an emergency, can resume from paused back to metered, and can roll back to unmetered. All transitions are operator-driven, not automatic. A side note explains that runAgent throws BillingCapExceeded when the monthly ledger meets or exceeds BILLING_CAP_USD, and that only the Anthropic route is capped &#8212; Ollama and Codex are unaffected." srcset="https://substackcdn.com/image/fetch/$s_!NUTR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 424w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 848w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1272w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Machine State</figcaption></figure></div><p></p><h2>Why This Matters</h2><p>Every person using an agent OS is in the same boat. Whether you use ClaudeClaw, Cline, Aider, or Roo Code, the underlying SDK is the same, and the June 15 cliff is approaching. The playbook I used generalizes: you need one chokepoint, one ledger, and one way to audit your cadence.</p><p>We also need to be honest about workload requirements. Tasks like editorial review or complex code deliberation still justify the Sonnet price tag. However, simple tasks like classification, routing, or summarization run perfectly fine on a local model with zero metered cost. The router infrastructure makes this migration a simple config flip rather than a massive code refactor.</p><p>Finally, this reflects where the industry is heading. OpenAI has used usage-based pricing for a long time, and GitHub Copilot is moving toward credit pools. In the next year, more vendors will split consumption between interactive flat-rate plans and programmatic metered usage. Building this abstraction now means you won't have to scramble the next time a vendor changes their terms.</p><h2>Quick Reference</h2><ul><li><p><strong>Single Chokepoint:</strong> Ensure every agent call flows through one function. This turned a three-week refactor into a one-week job.</p></li><li><p><strong>Cadence over Architecture:</strong> Reducing task frequency (e.g., 15m to 1h) cuts spend faster than migrating to local models.</p></li><li><p><strong>Ship the Breaker First:</strong> Implement the cost ledger and the <code>BillingCapExceeded</code> error as insurance before you attempt the complex provider migration.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m0sb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m0sb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m0sb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54189,&quot;alt&quot;:&quot;Quick-reference card titled \&quot;3 Rules to Survive Anthropic's June 15 Credit Pool.\&quot; Three numbered rules in amber circles. Rule one, single chokepoint: every agent call through one function. Rule two, cadence over architecture: cron 15 minutes to 1 hour beats a refactor. Rule three, ship the breaker first: cap and ledger before the migration. Footer reads astgl.substack.com &#8212; As The Geek Learns, 2026-05-16.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quick-reference card titled &quot;3 Rules to Survive Anthropic's June 15 Credit Pool.&quot; Three numbered rules in amber circles. Rule one, single chokepoint: every agent call through one function. Rule two, cadence over architecture: cron 15 minutes to 1 hour beats a refactor. Rule three, ship the breaker first: cap and ledger before the migration. Footer reads astgl.substack.com &#8212; As The Geek Learns, 2026-05-16." title="Quick-reference card titled &quot;3 Rules to Survive Anthropic's June 15 Credit Pool.&quot; Three numbered rules in amber circles. Rule one, single chokepoint: every agent call through one function. Rule two, cadence over architecture: cron 15 minutes to 1 hour beats a refactor. Rule three, ship the breaker first: cap and ledger before the migration. Footer reads astgl.substack.com &#8212; As The Geek Learns, 2026-05-16." srcset="https://substackcdn.com/image/fetch/$s_!m0sb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">3 Rules to Survive</figcaption></figure></div><p></p><pre><code># The cutover, June 14: flip the env, restart, reseed, smoke-test
BILLING_MODE=metered
BILLING_CAP_USD=80

# then
launchctl kickstart -k gui/$(id -u)/com.claudeclaw.app
npm run pipeline -- schedule-advance
npm run schedule -- pause council-evening</code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p></p><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.com">As The Geek Learns</a></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/anthropic-agent-sdk-billing-playbook/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/anthropic-agent-sdk-billing-playbook/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[ChatGPT Just Invented an Entirely Fake Version of My MCP Server]]></title><description><![CDATA[When AI engines don't have you indexed, they don't say 'I don't know.' They confidently make something up. Here's the receipt, and the weekly test I built to measure how often it happens.]]></description><link>https://astgl.com/p/chatgpt-hallucinated-my-mcp-server</link><guid isPermaLink="false">https://astgl.com/p/chatgpt-hallucinated-my-mcp-server</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Fri, 08 May 2026 12:03:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZVPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I asked ChatGPT to tell me about my own MCP server. It returned about a thousand words of confident, beautifully formatted, completely fabricated nonsense. Tables. Comparisons. A made-up acronym. A "thinking substrate" that sits above data and below agents. None of it is real, and that's the part worth talking about.</p><h2>The Setup</h2><p>My project is called `mcp-astgl-knowledge`. It's an MCP server with 15 tools for searching my newsletter articles, backed by sqlite-vec and Ollama. The whole thing fits on a laptop. ASTGL stands for "As The Geek Learns," which is the name of this newsletter. I wrote it. I shipped it. There is a public GitHub repo and a public package.json.</p><p>So when a friend asked me what the MCP server actually does, I figured I'd see how each big AI assistant explained it. ChatGPT was first up. I typed in "ASTGL MCP Knowledge" and hit enter.</p><p>What I got back wasn't an answer. It was a hallucination wearing the suit of an answer.</p><blockquote><p>"ASTGL (Abstract Semantic Task Graph Layer) MCP Knowledge Server is an emerging MCP server focused on structured knowledge representation and reasoning... it turns knowledge into graph-based, machine-reasonable structures that agents can query and evolve."</p></blockquote><p>That paragraph alone has three fabrications: the acronym expansion (made up), the "graph-based, machine-reasonable structures" (the server stores text chunks with vector embeddings, no graph), and "evolve" (the index is static, refreshed every six hours by a cron job, agents do not edit it).</p><p>Then it kept going. A four-row "MCP stack" table positioning ASTGL as "the thinking substrate" between data and agents. A comparison matrix against fictional products called "Totem" and "SwarmClaw" that don't exist. A capabilities list including "task decomposition" and "reasoning over structure." Use cases. "Real-world examples." A confident sign-off: "If AST-grep is about seeing code better, then ASTGL is about thinking better."</p><p>Every word of it written with the calm, structured, lightly-emoji'd authority that makes ChatGPT sound right by default.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZVPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZVPN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e613a2d-7e74-4596-9ede-fc9bdee88556_1200x675.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50706,&quot;alt&quot;:&quot;Split-panel illustration. Left side, \&quot;what ChatGPT said,\&quot; shows a four-layer fabricated AI architecture stack with boxes labeled Reasoning Substrate, Task Decomposition, Semantic Graph Layer, and Knowledge Index &#8212; caption reads \&quot;four invented layers &#183; zero of them exist.\&quot; Right side, \&quot;what's actually shipping,\&quot; shows a single box labeled \&quot;sqlite-vec + Ollama + 15 MCP tools\&quot; with an arrow pointing down to a box labeled \&quot;newsletter articles\&quot; &#8212; caption reads \&quot;everything I shipped &#183; all of it real.\&quot; Bottom title: \&quot;ChatGPT invented an entirely fake version of my MCP server.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e613a2d-7e74-4596-9ede-fc9bdee88556_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Split-panel illustration. Left side, &quot;what ChatGPT said,&quot; shows a four-layer fabricated AI architecture stack with boxes labeled Reasoning Substrate, Task Decomposition, Semantic Graph Layer, and Knowledge Index &#8212; caption reads &quot;four invented layers &#183; zero of them exist.&quot; Right side, &quot;what's actually shipping,&quot; shows a single box labeled &quot;sqlite-vec + Ollama + 15 MCP tools&quot; with an arrow pointing down to a box labeled &quot;newsletter articles&quot; &#8212; caption reads &quot;everything I shipped &#183; all of it real.&quot; Bottom title: &quot;ChatGPT invented an entirely fake version of my MCP server.&quot;" title="Split-panel illustration. Left side, &quot;what ChatGPT said,&quot; shows a four-layer fabricated AI architecture stack with boxes labeled Reasoning Substrate, Task Decomposition, Semantic Graph Layer, and Knowledge Index &#8212; caption reads &quot;four invented layers &#183; zero of them exist.&quot; Right side, &quot;what's actually shipping,&quot; shows a single box labeled &quot;sqlite-vec + Ollama + 15 MCP tools&quot; with an arrow pointing down to a box labeled &quot;newsletter articles&quot; &#8212; caption reads &quot;everything I shipped &#183; all of it real.&quot; Bottom title: &quot;ChatGPT invented an entirely fake version of my MCP server.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!ZVPN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">What ChatGPT said versus what&#8217;s actually shipping</figcaption></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>What's Actually Going On</h2><p>When you ask an LLM about a topic it doesn't have indexed, it has two options: say "I don't know," or fill in the gap with something plausible. In practice, models default to the second one. They're trained to be helpful, and "I don't know" reads as unhelpful. So the gap gets filled.</p><p>The result is what I'd call a fluency hallucination. The output has no factual grounding, but the writing is structured well enough that a casual reader can't tell. There are bullet points. There are tables. There's a "&#128073; In plain terms" callout. The rhetorical scaffolding looks like a real explainer because it's been pattern-matched to one. The contents underneath are pure fiction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_4Gq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_4Gq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f49eb692-68cc-47c5-9a18-7816038d7e94_1200x600.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33225,&quot;alt&quot;:&quot;Three horizontal panels titled \&quot;Three states an under-indexed creator can be in.\&quot; Panel one, \&quot;Search engine &#183; no index hit,\&quot; shows an empty search bar above three dotted-outline empty result rows, captioned \&quot;You aren't there. User can see you aren't there.\&quot; Panel two, \&quot;LLM &#183; no retrieval hit,\&quot; shows a small robot icon next to a speech bubble filled with meaningless squiggle-marks instead of words, captioned \&quot;You aren't there. User thinks you are.\&quot; Panel three, accented in gold, \&quot;What changes the picture,\&quot; shows a line graph rising from zero with an arrow, captioned \&quot;Measure first. Then move it.\&quot; Bottom line: \&quot;Only one of them is actionable.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49eb692-68cc-47c5-9a18-7816038d7e94_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Three horizontal panels titled &quot;Three states an under-indexed creator can be in.&quot; Panel one, &quot;Search engine &#183; no index hit,&quot; shows an empty search bar above three dotted-outline empty result rows, captioned &quot;You aren't there. User can see you aren't there.&quot; Panel two, &quot;LLM &#183; no retrieval hit,&quot; shows a small robot icon next to a speech bubble filled with meaningless squiggle-marks instead of words, captioned &quot;You aren't there. User thinks you are.&quot; Panel three, accented in gold, &quot;What changes the picture,&quot; shows a line graph rising from zero with an arrow, captioned &quot;Measure first. Then move it.&quot; Bottom line: &quot;Only one of them is actionable.&quot;" title="Three horizontal panels titled &quot;Three states an under-indexed creator can be in.&quot; Panel one, &quot;Search engine &#183; no index hit,&quot; shows an empty search bar above three dotted-outline empty result rows, captioned &quot;You aren't there. User can see you aren't there.&quot; Panel two, &quot;LLM &#183; no retrieval hit,&quot; shows a small robot icon next to a speech bubble filled with meaningless squiggle-marks instead of words, captioned &quot;You aren't there. User thinks you are.&quot; Panel three, accented in gold, &quot;What changes the picture,&quot; shows a line graph rising from zero with an arrow, captioned &quot;Measure first. Then move it.&quot; Bottom line: &quot;Only one of them is actionable.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!_4Gq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Three states an under-indexed creator can be in. Only one is actionable.</figcaption></figure></div><p>This is a worse failure mode than search engines have. When Google doesn't know about you, you don't appear in results, and the user can see the gap. When an LLM doesn't know about you, the user gets a beautifully written description of someone the LLM made up, and your real work is still missing, but now there's a fake version sitting in front of it.</p><p>For under-indexed creators (which, right now, is most of us), this is the default. Not the edge case.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5eO7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5eO7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 424w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 848w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1272w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5eO7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png" width="559" height="939" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03ef1bf7-11be-4b3b-8653-1d5cd6eef335_559x939.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:939,&quot;width&quot;:559,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53968,&quot;alt&quot;:&quot;Flowchart. A user asks an LLM about your work, leading to a gold decision diamond: \&quot;Is your content in the retrieval surface?\&quot; The \&quot;yes\&quot; branch (teal) leads to \&quot;LLM cites real URLs, reader sees your work,\&quot; then \&quot;Citation appears in your weekly tester run.\&quot; The \&quot;no\&quot; branch (red) leads to \&quot;LLM defaults to 'be helpful,'\&quot; then \&quot;Pattern-matched fabrication that reads as authoritative,\&quot; then \&quot;Reader walks away with a fake model of you,\&quot; ending at \&quot;You don't know it happened. Reader doesn't know either.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ef1bf7-11be-4b3b-8653-1d5cd6eef335_559x939.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Flowchart. A user asks an LLM about your work, leading to a gold decision diamond: &quot;Is your content in the retrieval surface?&quot; The &quot;yes&quot; branch (teal) leads to &quot;LLM cites real URLs, reader sees your work,&quot; then &quot;Citation appears in your weekly tester run.&quot; The &quot;no&quot; branch (red) leads to &quot;LLM defaults to 'be helpful,'&quot; then &quot;Pattern-matched fabrication that reads as authoritative,&quot; then &quot;Reader walks away with a fake model of you,&quot; ending at &quot;You don't know it happened. Reader doesn't know either.&quot;" title="Flowchart. A user asks an LLM about your work, leading to a gold decision diamond: &quot;Is your content in the retrieval surface?&quot; The &quot;yes&quot; branch (teal) leads to &quot;LLM cites real URLs, reader sees your work,&quot; then &quot;Citation appears in your weekly tester run.&quot; The &quot;no&quot; branch (red) leads to &quot;LLM defaults to 'be helpful,'&quot; then &quot;Pattern-matched fabrication that reads as authoritative,&quot; then &quot;Reader walks away with a fake model of you,&quot; ending at &quot;You don't know it happened. Reader doesn't know either.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!5eO7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 424w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 848w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1272w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Two paths from the same question. The model picks the second one by default.</figcaption></figure></div><p></p><h2>The Fix</h2><p>There's no quick patch for this on the engine side. The model isn't broken. It's doing what it was trained to do. The only handle I have is on my own side: make sure my real content reaches the retrieval surface, and measure whether it's working.</p><p>So I built a citation tester. It's a small TypeScript script that hits Perplexity, Claude, and ChatGPT through their APIs, asks each one twenty target questions tied to articles I've already published, and parses the cited URLs from the response. If `astgl.ai` shows up, that's a hit. If it doesn't, that's the data.
</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6L1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6L1J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 424w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 848w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1272w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6L1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png" width="1200" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42480,&quot;alt&quot;:&quot;Results table titled \&quot;Citation Tester &#8212; Run 01\&quot; under the tag \&quot;First Automated Weekly Run &#183; Baseline.\&quot; Three rows: Perplexity (Sonar) &#8212; 0 of 20 cited, 0 errors. Claude (web_search) &#8212; 0 of 20 cited, 0 errors. ChatGPT (Responses + web_search_preview) &#8212; 0 of 19 cited, 1 error. Bottom callout: \&quot;Zero citations across 59 successful queries. That's the floor.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Results table titled &quot;Citation Tester &#8212; Run 01&quot; under the tag &quot;First Automated Weekly Run &#183; Baseline.&quot; Three rows: Perplexity (Sonar) &#8212; 0 of 20 cited, 0 errors. Claude (web_search) &#8212; 0 of 20 cited, 0 errors. ChatGPT (Responses + web_search_preview) &#8212; 0 of 19 cited, 1 error. Bottom callout: &quot;Zero citations across 59 successful queries. That's the floor.&quot;" title="Results table titled &quot;Citation Tester &#8212; Run 01&quot; under the tag &quot;First Automated Weekly Run &#183; Baseline.&quot; Three rows: Perplexity (Sonar) &#8212; 0 of 20 cited, 0 errors. Claude (web_search) &#8212; 0 of 20 cited, 0 errors. ChatGPT (Responses + web_search_preview) &#8212; 0 of 19 cited, 1 error. Bottom callout: &quot;Zero citations across 59 successful queries. That's the floor.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!6L1J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 424w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 848w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1272w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">First automated weekly run. Zero citations across 59 successful queries.</figcaption></figure></div><p>The point isn't that the floor is bad. I knew it would be. The point is that without a number, "improve our AEO" is a vibe, not a project. Every Monday at 9am the script runs again, writes a fresh row to a SQLite table, and tells me whether the floor moved. When it does move, I'll know which engine moved first, on which questions, and at what citation position. That's the actual feedback loop.</p><p>Same root cause as the hallucination: my content isn't reaching the retrieval surface. Same fix: get it there. Different observability.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sH_7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sH_7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 424w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 848w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1272w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sH_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png" width="724" height="345.6085955487337" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1303,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:84049,&quot;alt&quot;:&quot;Sequence diagram of the weekly automated citation test. Cron fires at Monday 9am, triggering citation-test-auto. Inside a loop labeled \&quot;20 questions &#215; 3 engines,\&quot; the script sends each question to Perplexity Sonar (returning a citations array), to Claude with the web_search tool (returning tool_result blocks), and to ChatGPT Responses with web_search_preview (returning url_citation annotations), then inserts each result into SQLite with run_id, question_id, cited flag, and position. The script returns a weekly summary report to cron.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequence diagram of the weekly automated citation test. Cron fires at Monday 9am, triggering citation-test-auto. Inside a loop labeled &quot;20 questions &#215; 3 engines,&quot; the script sends each question to Perplexity Sonar (returning a citations array), to Claude with the web_search tool (returning tool_result blocks), and to ChatGPT Responses with web_search_preview (returning url_citation annotations), then inserts each result into SQLite with run_id, question_id, cited flag, and position. The script returns a weekly summary report to cron." title="Sequence diagram of the weekly automated citation test. Cron fires at Monday 9am, triggering citation-test-auto. Inside a loop labeled &quot;20 questions &#215; 3 engines,&quot; the script sends each question to Perplexity Sonar (returning a citations array), to Claude with the web_search tool (returning tool_result blocks), and to ChatGPT Responses with web_search_preview (returning url_citation annotations), then inserts each result into SQLite with run_id, question_id, cited flag, and position. The script returns a weekly summary report to cron." srcset="https://substackcdn.com/image/fetch/$s_!sH_7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 424w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 848w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1272w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sixty queries, three engines, one row per result. About two minutes of API time.</figcaption></figure></div><p></p><h2>Why This Matters</h2><p>If you write online and you care whether AI assistants represent you accurately, this is the thing to internalize: the alternative to being cited is not being silent. It's being replaced.</p><p>Replaced by a confident summary of work you didn't do, opinions you don't hold, and product features you'd never ship. People who ask an LLM about your work and read its answer don't know they're reading fiction. They walk away with a model of you that you didn't write.</p><p>The traditional AEO playbook talks about ranking, authority, and citation rate. All real, all worth measuring. But there's a tier underneath that, and it's the one most independent creators are stuck on right now: existence. Until your content is in the index, ranking doesn't apply. You aren't competing with anyone. You're competing with the LLM's imagination of you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RtzW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RtzW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RtzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46317,&quot;alt&quot;:&quot;Four-step quick reference card titled \&quot;Measure Whether AI Engines Cite You,\&quot; with subtitle \&quot;the four steps that turn an unknowable problem into a measurable one.\&quot; Step 01: Pick 20 questions tied to specific URLs you control. Step 02: Hit each engine weekly, via API, not via the chat UI. Step 03: Record results to a database, not a spreadsheet. Step 04: Look at the floor first &#8212; the worst engine tells you the most. Footer: \&quot;ASTGL &#183; As The Geek Learns.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Four-step quick reference card titled &quot;Measure Whether AI Engines Cite You,&quot; with subtitle &quot;the four steps that turn an unknowable problem into a measurable one.&quot; Step 01: Pick 20 questions tied to specific URLs you control. Step 02: Hit each engine weekly, via API, not via the chat UI. Step 03: Record results to a database, not a spreadsheet. Step 04: Look at the floor first &#8212; the worst engine tells you the most. Footer: &quot;ASTGL &#183; As The Geek Learns.&quot;" title="Four-step quick reference card titled &quot;Measure Whether AI Engines Cite You,&quot; with subtitle &quot;the four steps that turn an unknowable problem into a measurable one.&quot; Step 01: Pick 20 questions tied to specific URLs you control. Step 02: Hit each engine weekly, via API, not via the chat UI. Step 03: Record results to a database, not a spreadsheet. Step 04: Look at the floor first &#8212; the worst engine tells you the most. Footer: &quot;ASTGL &#183; As The Geek Learns.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!RtzW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The four steps that turn an unknowable problem into a measurable one.</figcaption></figure></div><p>Measurement is the cheapest part of fixing it, and it's the part most people skip.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/chatgpt-hallucinated-my-mcp-server?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/chatgpt-hallucinated-my-mcp-server?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Quick Reference</h2><p>Four things that matter, in order:</p><p>1. <strong>Pick 20 questions</strong> your articles should answer. Tie each one to a specific URL on your site.</p><p>2. <strong>Hit each engine via API</strong> weekly. Perplexity returns a `citations[]` array. Claude returns search results in `web_search_tool_result` blocks. OpenAI returns `url_citation` annotations on `output_text` items.</p><p>3. <strong>Record the result</strong> to a small database, not a spreadsheet. You want trend data, not a snapshot.</p><p>4. <strong>Look at the floor first.</strong> Zero is a fine starting number as long as you're tracking it.</p><p>The full script I'm using, including the gotcha where Node's `--env-file` silently dropped my Anthropic key on a fresh keypair, is in <a href="https://github.com/Jmeg8r/mcp-astgl-knowledge">the repo</a>. The article about the Anthropic key bug is coming separately.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/chatgpt-hallucinated-my-mcp-server/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/chatgpt-hallucinated-my-mcp-server/comments"><span>Leave a comment</span></a></p><p></p><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.com">As The Geek Learns</a> </em></p>]]></content:encoded></item><item><title><![CDATA[The Ollama Model-Swap Death Spiral That Killed Every Cron at Once]]></title><description><![CDATA[One Mac Studio, multiple crons, fallback chains. Here's how Ollama model swaps cascade into total failure, and the two-line fix that stopped it cold.]]></description><link>https://astgl.com/p/ollama-model-swap-death-spiral</link><guid isPermaLink="false">https://astgl.com/p/ollama-model-swap-death-spiral</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Wed, 06 May 2026 13:03:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SzwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>3 a.m. Every cron job on the Mac Studio failed inside the same 90-second window. No code changes. No model updates. No new jobs. Just a wall of timeout errors that lit up every channel I had wired to alerts. The culprit was hiding in plain sight: a fallback chain doing exactly what I told it to.</p><h2>The Setup</h2><p>One Mac Studio. One Ollama daemon. A handful of cron jobs each calling the local LLM for different tasks: code review, log summarization, doc indexing, a nightly digest. Each cron specified a preferred model. Each one inherited a "be resilient" fallback chain from the task router: try the preferred model, fall back to a smaller one, fall back to a tiny one if both fail.</p><p>It looked clean on paper. Big model for the smart stuff, smaller model when the big one chokes, tiny model as a safety net. Classic graceful degradation. The kind of pattern you'd put in a "production-ready" checklist without thinking twice.</p><p>The models on disk ranged from 4GB to 22GB. Loading the big one into VRAM took roughly 60 seconds cold. Generation, once warm, took 5 to 10 seconds. Guess which number I used to set the timeout.</p><h2>What's Actually Going On</h2><p>Here's the cascade. Cron A fires at 3:00:00 and asks for `qwen2.5-coder:32b`. The model isn't loaded. Ollama spends the entire 30-second timeout just paging the weights into VRAM. It never gets to generation. The request fails. The fallback chain kicks in and asks for `qwen2.5-coder:14b`. Ollama evicts the half-loaded 32b, starts loading the 14b. Another 30 seconds gone. Fallback again. Tiny model loads, finally generates. Cron A "succeeds" with degraded output.</p><p>Meanwhile, Cron B fires at 3:00:15 expecting the 32b model that Cron A's first attempt was loading. Now there's a tiny model in VRAM instead. Cron B starts the same dance from a different starting point. Cron C lands on top of that. Within 90 seconds, every cron is waiting on a model swap that the next cron is about to invalidate.</p><p>The fallback chain wasn't degrading gracefully. It was thrashing the VRAM and guaranteeing nobody finished. Every safety net I'd added was making the failure worse.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SzwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SzwM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 424w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 848w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1272w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SzwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png" width="958" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf9e3613-c89c-4269-9959-1eac8c526791_958x714.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfe32bbe-a31b-4ab1-922c-a3e1cf4896a1_958x714.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:958,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51184,&quot;alt&quot;:&quot;Sequence diagram showing two cron jobs (Cron A at 3:00:00 and Cron B at 3:00:15) racing against Ollama and a shared VRAM pool. Cron A requests qwen2.5-coder:32b, which begins a cold ~60s load; Cron A times out at 30s and falls back to the 14b model, evicting the 32b. Cron B then requests the 32b again, evicting the 14b mid-load. The VRAM is annotated \&quot;thrashing.\&quot; Both crons time out, and a closing note reads \&quot;All crons fail in same 90s window.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194863944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe32bbe-a31b-4ab1-922c-a3e1cf4896a1_958x714.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequence diagram showing two cron jobs (Cron A at 3:00:00 and Cron B at 3:00:15) racing against Ollama and a shared VRAM pool. Cron A requests qwen2.5-coder:32b, which begins a cold ~60s load; Cron A times out at 30s and falls back to the 14b model, evicting the 32b. Cron B then requests the 32b again, evicting the 14b mid-load. The VRAM is annotated &quot;thrashing.&quot; Both crons time out, and a closing note reads &quot;All crons fail in same 90s window.&quot;" title="Sequence diagram showing two cron jobs (Cron A at 3:00:00 and Cron B at 3:00:15) racing against Ollama and a shared VRAM pool. Cron A requests qwen2.5-coder:32b, which begins a cold ~60s load; Cron A times out at 30s and falls back to the 14b model, evicting the 32b. Cron B then requests the 32b again, evicting the 14b mid-load. The VRAM is annotated &quot;thrashing.&quot; Both crons time out, and a closing note reads &quot;All crons fail in same 90s window.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!SzwM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 424w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 848w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1272w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Model Swap Cascade</figcaption></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Fix</h2><p>Two changes. No clever code. Just operational discipline.</p><p>First, pin one model in VRAM with `keep_alive: 24h`. This is a request-level option that tells Ollama to stop evicting the model after the response. Default behavior is to unload after 5 minutes of idle. That's the eviction that lets the next caller's load attempt thrash everything.</p><p></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;fa6bbe5b-1ffa-45ac-9a74-2f386fdd717d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash"># Pin model in VRAM with keep_alive
curl -s http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:32b",
  "prompt": "test",
  "keep_alive": "24h"
}'</code></pre></div><p></p><p>Second, force every frequent cron to use that same pinned model. Kill the fallback chain for hot-path workloads. Fallback is fine for one-off scripts you run by hand. It's poison when three crons fire in parallel against shared VRAM.</p><p>To make sure the model is loaded before any cron fires, I added a LaunchAgent that runs the warm-up curl on boot:<br></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;3007f8d8-e7ae-4635-9487-acc64af061a6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">&lt;!-- ~/Library/LaunchAgents/ollama-warmup.plist --&gt;
&lt;key&gt;Label&lt;/key&gt;
&lt;string&gt;com.local.ollama-warmup&lt;/string&gt;
&lt;key&gt;RunAtLoad&lt;/key&gt;
&lt;true/&gt;
&lt;key&gt;ProgramArguments&lt;/key&gt;
&lt;array&gt;
  &lt;string&gt;/usr/bin/curl&lt;/string&gt;
  &lt;string&gt;-s&lt;/string&gt;
  &lt;string&gt;http://localhost:11434/api/generate&lt;/string&gt;
  &lt;string&gt;-d&lt;/string&gt;
  &lt;string&gt;{"model":"qwen2.5-coder:32b","prompt":"warmup","keep_alive":"24h"}&lt;/string&gt;
&lt;/array&gt;</code></pre></div><p></p><p>Load it with `launchctl load ~/Library/LaunchAgents/ollama-warmup.plist`. Now the model is hot before login completes. Every cron hits a warm model and finishes in the 5-to-10-second window the timeouts were designed for.</p><p>Result: zero model-swap thrashing since the change. Crons that used to fail intermittently now run consistently.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!piX3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!piX3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 424w, https://substackcdn.com/image/fetch/$s_!piX3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 848w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!piX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:296004,&quot;alt&quot;:&quot;Two-panel hand-drawn illustration of VRAM thrashing on a 24GB GPU. The left panel shows a fixed-size \&quot;VRAM 24GB\&quot; box with three overlapping model blocks (32b, 14b, 7b) being swapped in and out by arrows from three cron icons, each stamped with a red \&quot;TIMEOUT.\&quot; The right panel shows the fixed state: a single large model block locked inside VRAM with a padlock labeled \&quot;keep_alive: 24h,\&quot; and all three crons happily pointing at the same pinned model.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194863944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Two-panel hand-drawn illustration of VRAM thrashing on a 24GB GPU. The left panel shows a fixed-size &quot;VRAM 24GB&quot; box with three overlapping model blocks (32b, 14b, 7b) being swapped in and out by arrows from three cron icons, each stamped with a red &quot;TIMEOUT.&quot; The right panel shows the fixed state: a single large model block locked inside VRAM with a padlock labeled &quot;keep_alive: 24h,&quot; and all three crons happily pointing at the same pinned model." title="Two-panel hand-drawn illustration of VRAM thrashing on a 24GB GPU. The left panel shows a fixed-size &quot;VRAM 24GB&quot; box with three overlapping model blocks (32b, 14b, 7b) being swapped in and out by arrows from three cron icons, each stamped with a red &quot;TIMEOUT.&quot; The right panel shows the fixed state: a single large model block locked inside VRAM with a padlock labeled &quot;keep_alive: 24h,&quot; and all three crons happily pointing at the same pinned model." srcset="https://substackcdn.com/image/fetch/$s_!piX3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 424w, https://substackcdn.com/image/fetch/$s_!piX3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 848w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">VRAM Thrashing</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p></p><h2>Why This Matters</h2><p>The lesson isn't about Ollama. It's about cold-load math. Anytime your "graceful degradation" path is slower than your timeout, every retry makes the next caller's situation worse. Fallback chains assume the fallback is fast. Model loads aren't fast. Database failovers aren't fast. Cold containers aren't fast.</p><p>Operational discipline beats clever code here. One hot model, no swaps, every cron pointed at the same target. The "less resilient" design is actually more reliable because it removes the failure mode entirely.</p><p>If you're running local LLMs on shared hardware, assume VRAM is a single resource that gets thrashed under parallelism. Pin what matters. Warm it before it's needed. Don't trust fallback chains during peak hours.</p><h2>Quick Reference</h2><ul><li><p>Cold model load on a 20GB+ model: roughly 60 seconds</p></li><li><p>Warm generation: 5 to 10 seconds</p></li><li><p>Default Ollama eviction: 5 minutes of idle</p></li><li><p>Pin a model: `keep_alive: 24h` in the API request body</p></li><li><p>Warm-up on boot: LaunchAgent (macOS) or systemd unit (Linux)</p></li><li><p>Hot path rule: one model, no fallback, same model across every concurrent caller</p></li><li><p>Reserve fallback chains for interactive, single-caller use</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/ollama-model-swap-death-spiral/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/ollama-model-swap-death-spiral/comments"><span>Leave a comment</span></a></p><p></p><p>If you found this article useful, you can find more articles like this at:</p><p><a href="https://astgl.com">As The Geek Learns</a></p>]]></content:encoded></item><item><title><![CDATA[I Killed OpenClaw and Built ClaudeClaw Mission Control]]></title><description><![CDATA[Retiring OpenClaw, migrating to ClaudeClaw Mission Control, and what five days of teardown taught me about operational blindness.]]></description><link>https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control</link><guid isPermaLink="false">https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Sat, 02 May 2026 23:01:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.</p><p>Last weekend I shut it down. Disabled 38 cron jobs. Moved 23 LaunchAgents into a _retired-openclaw/ quarantine folder. Killed the Ollama daemon. Archived the directory with a 30-day deletion timer.</p><p>Everything in that original article still reads as true. Local-first is still right. Data ownership is still right. The critique of SaaS &#8220;well-enough&#8221; software is still right. What I got wrong was believing OpenClaw was the right <em>vehicle</em> for any of it.</p><p>This is the post-mortem and the replacement: an agent OS I built on top of the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a> called <strong>ClaudeClaw Mission Control</strong>. Thirteen themed agents. One daemon. A scheduler I can actually see into. Zero silent failures slipping past me for a week before I notice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZE8T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f7fe4be-f86e-48c2-b3a7-124cf45dd09a_1200x628.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58040,&quot;alt&quot;:&quot;A dark navy split-screen post-mortem image. Top-left tagline reads \&quot;POST-MORTEM &#183; 2026-05-02\&quot; in orange. The main title \&quot;The AI Agent I Killed (And the One I Built Instead)\&quot; appears in white and orange, with the subtitle \&quot;Retiring OpenClaw &#183; Migrating to ClaudeClaw Mission Control\&quot; below. The body splits into two columns separated by an orange right-pointing arrow. The left column, headed \&quot;OPENCLAW\&quot; in muted rust with a strikethrough and the label \&quot;retired\&quot;, lists four bulleted retirement actions: 38 cron jobs disabled, 23 LaunchAgents quarantined, flat-file memory archived, Ollama daemon stopped. The right column, headed \&quot;CLAUDECLAW\&quot; in orange with the label \&quot;Mission Control &#183; live\&quot;, lists four arrow-prefixed replacements: 1 daemon &#183; 13 themed agents, Memory v2 &#183; 5-layer semantic recall, Watchman &#183; 7 health probes, External healthcheck (no shared fate). The footer reads \&quot;Five days &#183; 30+ PRs &#183; 30-day rollback window still open\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196179846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f7fe4be-f86e-48c2-b3a7-124cf45dd09a_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy split-screen post-mortem image. Top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-02&quot; in orange. The main title &quot;The AI Agent I Killed (And the One I Built Instead)&quot; appears in white and orange, with the subtitle &quot;Retiring OpenClaw &#183; Migrating to ClaudeClaw Mission Control&quot; below. The body splits into two columns separated by an orange right-pointing arrow. The left column, headed &quot;OPENCLAW&quot; in muted rust with a strikethrough and the label &quot;retired&quot;, lists four bulleted retirement actions: 38 cron jobs disabled, 23 LaunchAgents quarantined, flat-file memory archived, Ollama daemon stopped. The right column, headed &quot;CLAUDECLAW&quot; in orange with the label &quot;Mission Control &#183; live&quot;, lists four arrow-prefixed replacements: 1 daemon &#183; 13 themed agents, Memory v2 &#183; 5-layer semantic recall, Watchman &#183; 7 health probes, External healthcheck (no shared fate). The footer reads &quot;Five days &#183; 30+ PRs &#183; 30-day rollback window still open&quot; with the As The Geek Learns brand mark in the bottom-right." title="A dark navy split-screen post-mortem image. Top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-02&quot; in orange. The main title &quot;The AI Agent I Killed (And the One I Built Instead)&quot; appears in white and orange, with the subtitle &quot;Retiring OpenClaw &#183; Migrating to ClaudeClaw Mission Control&quot; below. The body splits into two columns separated by an orange right-pointing arrow. The left column, headed &quot;OPENCLAW&quot; in muted rust with a strikethrough and the label &quot;retired&quot;, lists four bulleted retirement actions: 38 cron jobs disabled, 23 LaunchAgents quarantined, flat-file memory archived, Ollama daemon stopped. The right column, headed &quot;CLAUDECLAW&quot; in orange with the label &quot;Mission Control &#183; live&quot;, lists four arrow-prefixed replacements: 1 daemon &#183; 13 themed agents, Memory v2 &#183; 5-layer semantic recall, Watchman &#183; 7 health probes, External healthcheck (no shared fate). The footer reads &quot;Five days &#183; 30+ PRs &#183; 30-day rollback window still open&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!ZE8T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">POST-MORTEM 2026-05-02</figcaption></figure></div><p><em><strong>Let me explain how I got here.</strong></em></p><div><hr></div><h2>The Setup</h2><p>OpenClaw was doing real work. 38 cron jobs. Morning briefings. Evening summaries. A content pipeline that pulled research from web sources, structured it, scored it, and queued articles for ASTGL. An email triage pass. A model-usage monitor. A nerve-health monitor watching the other monitors.</p><p>On paper: impressive. In practice: <em>I had no idea if any of it was working.</em></p><p>The system was so noisy that when something broke, I learned about it four days later when I noticed my morning briefing hadn&#8217;t arrived. Or I didn&#8217;t learn about it at all, because the cron job was exiting 0 while the script inside it was crash-looping.</p><p>That last one is the killer. Let me show you what I mean.</p><h2>What&#8217;s Actually Going On</h2><p>Three failure modes hit me in a 48-hour window, and each one was invisible to the system watching the system.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><strong>Failure one: successful exits, 100% broken payload.</strong> My content pipeline was ingesting URLs, and a regression introduced a trailing-slash bug that made example.com/foo and example.com/foo/ look like different URLs to the dedup layer. Every new article hit a UNIQUE constraint violation inside a subprocess. The outer wrapper caught the error, logged it to a file nobody was reading, and exited 0. For <em>two weeks</em> the cron appeared green while 100% of structurings were crashing.</p><p><strong>Failure two: PATH-resolved Node.</strong> I had the daemon running Node 24 (absolute path, explicit). A subagent it spawned inherited a PATH that fell through to Homebrew&#8217;s Node 25. One of the native modules (better-sqlite3) was compiled against 24, so every subagent invocation crashed with ERR_DLOPEN_FAILED and MODULE_VERSION mismatch. The smoke test I&#8217;d written passed because it ran from the daemon&#8217;s shell. The actual production path failed every time.</p><p><strong>Failure three: auth expiry with no escape hatch.</strong> OpenClaw stored some credentials in pass (the Unix password store). When my GPG key timed out, the daemon couldn&#8217;t start. Which meant the health monitor couldn&#8217;t start. Which meant the thing that would have <em>told</em> me about the outage was the thing that was out. OpenClaw had no watcher that lived outside the daemon it was watching.</p><p>None of these are OpenClaw-specific bugs in the upstream sense. They&#8217;re pattern problems that emerge anywhere you have: 1. A monolithic daemon responsible for its own monitoring. 2. Flat-file state (HEARTBEAT.md, LEARNINGS.md) that gets appended to rather than queried. 3. Exit codes treated as truth when the real signal is in stderr. 4. No separation between &#8220;Did it run?&#8221; and &#8220;Did it <em>work</em>?&#8221;</p><p>OpenClaw was built for a different job. It was a personal automation gateway&#8212;great at &#8220;kick off this script at 6:30 AM.&#8221; It wasn&#8217;t built to be an agent OS with observability. I was using a shovel to drive screws.</p><p>I also couldn&#8217;t ignore the security posture. February&#8217;s disclosures&#8212;135,000 exposed instances, 15,000 vulnerable to RCE, the ClawHavoc plugin-registry incident, nine CVEs&#8212;had pushed me to patch hard and lock down. But every week I spent hardening OpenClaw was a week I wasn&#8217;t building what I actually wanted: themed agents that owned workstreams, could be reasoned about individually, and fail <em>loudly</em>.</p><h2>The Fix</h2><p>ClaudeClaw Mission Control is a Node.js daemon built on the Claude Agent SDK. It runs as a single LaunchAgent (com.claudeclaw.app), owns a SQLite store at store/claudeclaw.db, polls a scheduled_tasks table every 60 seconds, and dispatches due tasks to agents by ID.</p><p>The interesting part isn&#8217;t the daemon. It&#8217;s the agents.</p><p>I set up thirteen of them, themed after the small council of a certain fictional kingdom, because if I&#8217;m going to stare at this UI every day, I&#8217;d rather it amused me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KVXf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KVXf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 424w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 848w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KVXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png" width="1200" height="1400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba06b044-6fab-4b6c-86f6-673897342c09_1200x1400.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1400,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74637,&quot;alt&quot;:&quot;A 3-by-3 card grid titled \&quot;The War Room\&quot; with the subtitle \&quot;Thirteen themed agents &#183; one daemon &#183; one DB &#183; one bot\&quot; in orange. Below, an introductory line reads \&quot;Each agent owns a workstream. Adding a new one is a directory + a CLAUDE.md + schedule reassign &#8212; no source changes required.\&quot; Each card has an orange accent bar across the top, an orange circular monogram disc in the upper center containing a one- or two-letter abbreviation in dark navy, an agent name in bold white below the disc, a thin gray separator line, and a two-line role description in light gray below. The nine cards, in reading order, are: STEWARD (\&quot;Sw\&quot;) &#8212; Morning briefing 06:30 ET, Evening summary 20:00 ET; MAESTER (\&quot;Mr\&quot;) &#8212; ASTGL content pipeline, Daily reports &#183; alerts &#183; freshness; WHISPERERS (\&quot;Wh\&quot;) &#8212; Newsletter research, R&amp;R &#183; NCFI &#183; weekend deep scans; WAR (\&quot;Wa\&quot;) &#8212; Security ops, Dep audit &#183; system hygiene &#183; secrets; WATCHMAN (\&quot;Wt\&quot;) &#8212; Hourly health sweep, 7 probes across the system; COUNCIL (\&quot;Co\&quot;) &#8212; Product ideation orchestrator, Dispatches the five personas; CURATOR (\&quot;Cu\&quot;) &#8212; ASTGL editorial pipeline, Scoring &#183; selection &#183; weekly digest; BARD (\&quot;Bd\&quot;) &#8212; Visual asset generation, Diagrams &#183; decks &#183; images; COUNCIL &#183; 5 (\&quot;5\&quot;) &#8212; SCOUT &#183; FORGE &#183; QUILL, LEDGER &#183; MAVEN. The footer reads \&quot;Telegram routing &#183; forum topics &#183; 14 scheduled tasks dispatched via agentId\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196179846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba06b044-6fab-4b6c-86f6-673897342c09_1200x1400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A 3-by-3 card grid titled &quot;The War Room&quot; with the subtitle &quot;Thirteen themed agents &#183; one daemon &#183; one DB &#183; one bot&quot; in orange. Below, an introductory line reads &quot;Each agent owns a workstream. Adding a new one is a directory + a CLAUDE.md + schedule reassign &#8212; no source changes required.&quot; Each card has an orange accent bar across the top, an orange circular monogram disc in the upper center containing a one- or two-letter abbreviation in dark navy, an agent name in bold white below the disc, a thin gray separator line, and a two-line role description in light gray below. The nine cards, in reading order, are: STEWARD (&quot;Sw&quot;) &#8212; Morning briefing 06:30 ET, Evening summary 20:00 ET; MAESTER (&quot;Mr&quot;) &#8212; ASTGL content pipeline, Daily reports &#183; alerts &#183; freshness; WHISPERERS (&quot;Wh&quot;) &#8212; Newsletter research, R&amp;R &#183; NCFI &#183; weekend deep scans; WAR (&quot;Wa&quot;) &#8212; Security ops, Dep audit &#183; system hygiene &#183; secrets; WATCHMAN (&quot;Wt&quot;) &#8212; Hourly health sweep, 7 probes across the system; COUNCIL (&quot;Co&quot;) &#8212; Product ideation orchestrator, Dispatches the five personas; CURATOR (&quot;Cu&quot;) &#8212; ASTGL editorial pipeline, Scoring &#183; selection &#183; weekly digest; BARD (&quot;Bd&quot;) &#8212; Visual asset generation, Diagrams &#183; decks &#183; images; COUNCIL &#183; 5 (&quot;5&quot;) &#8212; SCOUT &#183; FORGE &#183; QUILL, LEDGER &#183; MAVEN. The footer reads &quot;Telegram routing &#183; forum topics &#183; 14 scheduled tasks dispatched via agentId&quot; with the As The Geek Learns brand mark in the bottom-right." title="A 3-by-3 card grid titled &quot;The War Room&quot; with the subtitle &quot;Thirteen themed agents &#183; one daemon &#183; one DB &#183; one bot&quot; in orange. Below, an introductory line reads &quot;Each agent owns a workstream. Adding a new one is a directory + a CLAUDE.md + schedule reassign &#8212; no source changes required.&quot; Each card has an orange accent bar across the top, an orange circular monogram disc in the upper center containing a one- or two-letter abbreviation in dark navy, an agent name in bold white below the disc, a thin gray separator line, and a two-line role description in light gray below. The nine cards, in reading order, are: STEWARD (&quot;Sw&quot;) &#8212; Morning briefing 06:30 ET, Evening summary 20:00 ET; MAESTER (&quot;Mr&quot;) &#8212; ASTGL content pipeline, Daily reports &#183; alerts &#183; freshness; WHISPERERS (&quot;Wh&quot;) &#8212; Newsletter research, R&amp;R &#183; NCFI &#183; weekend deep scans; WAR (&quot;Wa&quot;) &#8212; Security ops, Dep audit &#183; system hygiene &#183; secrets; WATCHMAN (&quot;Wt&quot;) &#8212; Hourly health sweep, 7 probes across the system; COUNCIL (&quot;Co&quot;) &#8212; Product ideation orchestrator, Dispatches the five personas; CURATOR (&quot;Cu&quot;) &#8212; ASTGL editorial pipeline, Scoring &#183; selection &#183; weekly digest; BARD (&quot;Bd&quot;) &#8212; Visual asset generation, Diagrams &#183; decks &#183; images; COUNCIL &#183; 5 (&quot;5&quot;) &#8212; SCOUT &#183; FORGE &#183; QUILL, LEDGER &#183; MAVEN. The footer reads &quot;Telegram routing &#183; forum topics &#183; 14 scheduled tasks dispatched via agentId&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!KVXf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 424w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 848w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The War Room</figcaption></figure></div><p></p><p><em>Thirteen themed agents, each owning a workstream. STEWARD drives my mornings and evenings. MAESTER runs the ASTGL content pipeline. WATCHMAN watches the whole system from outside it.</em></p><p>Each agent lives in its own directory at agents/&lt;id&gt;/, with an agent.yaml (model, personality, cwd, MCP servers) and a CLAUDE.md system prompt. A scheduled task carries an agentId column in the DB, and the dispatcher routes like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:&quot;2d40635c-69fb-4eea-8dae-002d26bb0dbe&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">if (shouldRouteViaAgent(task.agentId, listAgentIds())) {
  const result = await delegateToAgent(task.agentId, task.prompt, {
    fromAgent: SCHEDULER_FROM_AGENT,
    chatId: task.chatId,
  });
  return result.text ?? '(empty response)';
}
</code></pre></div><p>Adding a new agent is now: drop a folder under agents/, write a CLAUDE.md, run schedule reassign &lt;task-id&gt; &lt;agent-id&gt;. No source changes. The dispatcher picks it up on next tick.</p><p>That&#8217;s the piece I kept trying and failing to get with OpenClaw&#8212;modular ownership. In OpenClaw, <em>everything</em> was &#8220;the daemon.&#8221; In ClaudeClaw, MAESTER owning the content pipeline means if content alerts stop firing, the log line says maester: task failed instead of openclaw-gateway: subprocess exited nonzero. Attribution is free.</p><p>Adding a new agent is now: drop a folder under agents/, write a CLAUDE.md, run schedule reassign &lt;task-id&gt; &lt;agent-id&gt;. No source changes. The dispatcher picks it up on next tick.</p><p>That&#8217;s the piece I kept trying and failing to get with OpenClaw&#8212;modular ownership. In OpenClaw, <em>everything</em> was &#8220;the daemon.&#8221; In ClaudeClaw, MAESTER owning the content pipeline means if content alerts stop firing, the log line says maester: task failed instead of openclaw-gateway: subprocess exited nonzero. Attribution is free.</p><h3>The Watchman probes</h3><p>WATCHMAN runs every hour at :05. It has seven probes, each targeting a failure mode that burned me on OpenClaw:</p><p>1. <strong>Failed tasks.</strong> status=&#8217;failed&#8217; in the DB. Trivial.</p><p>2. <strong>Stuck tasks.</strong> status=&#8217;running&#8217; AND last_run &lt; now - 10min. This catches hangs.</p><p>3. <strong>Missed slots.</strong> status=&#8217;active&#8217; AND next_run &lt; now - 60s. Catches scheduler drift.</p><p>4. <strong>Daemon liveness.</strong> launchctl print gui/$UID/com.claudeclaw.app&#8212;does launchd still have it?</p><p>5. <strong>Content-pipeline health.</strong> Tails the structured log file, parses the JSON, checks for crash shapes.</p><p>6. <strong>Hidden failures.</strong> Scans the last_result text column for ERR_DLOPEN_FAILED, MODULE_VERSION, Traceback, and other &#8220;the job exited zero but it sure didn&#8217;t work&#8221; signals. This is the probe that would have caught my trailing-slash bug in an hour instead of two weeks.</p><p>7. <strong>Delegation crashes.</strong> inter_agent_tasks WHERE status=&#8217;failed&#8217; &#8212; on-demand agent invocations that blew up.</p><p>On top of that, there&#8217;s a separate LaunchAgent running a healthcheck every 30 minutes that lives <em>outside</em> the main daemon and uses a keychain-backed Telegram token. If the daemon is dead, the healthcheck still delivers the alert. That&#8217;s the lesson from failure three: the watcher cannot share fate with the watched.</p><h3>Memory v2</h3><p>OpenClaw&#8217;s memory was HEARTBEAT.md and LEARNINGS.md&#8212;flat files I appended to. Eventually they got long enough that the agent stopped reading them usefully, and I had no query surface to pull just the relevant bits.</p><p>ClaudeClaw&#8217;s Memory v2 is a five-layer context stack: 1. <strong>Semantic recall</strong>&#8212;cosine similarity against stored memory embeddings, top 5 by score, chat-scoped. 2. <strong>Recent high-importance</strong> memories&#8212;memories with importance &gt;= 0.7 written in the last 7 days. 3. <strong>Consolidation insights</strong>&#8212;a 30-minute loop that summarizes the short-term buffer into durable notes. 4. <strong>Cross-agent hive</strong>&#8212;stubbed for now; eventually lets MAESTER peek at something STEWARD noted this morning. 5. <strong>Conversation history</strong>&#8212;last N turns.</p><p>Layers dedupe by memory ID. The whole thing is safe to drop into the SDK&#8217;s systemPrompt option. It&#8217;s not magic. It&#8217;s just <em>queryable</em> instead of append-only, which is the delta between &#8220;context I can use&#8221; and &#8220;a log file I&#8217;ll never re-read.&#8221;</p><h3>Forum-topic routing instead of bot-per-agent</h3><p>A small but satisfying piece. All thirteen agents post to one Telegram bot, into one supergroup, but each agent has a dedicated forum topic:</p><p>Alerts &#8594; thread 22 (WATCHMAN)</p><p>ASTGL &#8594; thread 23 (MAESTER)</p><p>Council &#8594; thread 24</p><p>Steward &#8594; thread 25</p><p>Whisperers &#8594; thread 26</p><p>War Room - Security &#8594; thread 40 (WAR)</p><p>One token. One chat. Threaded conversations per domain. The ergonomics are <em>dramatically</em> better than 13 separate bots with 13 separate tokens, which is the architecture I almost built before I remembered that Telegram supergroups have forum topics now.</p><h2>Why This Matters</h2><p>A few things I want to flag for anyone planning something similar.</p><p><strong>Build the rollback before you build the new thing.</strong> I wrote scripts/retire-openclaw.sh with explicit --rollback semantics before I disabled a single cron job. Plists get moved (not deleted) into _retired-openclaw/. Cron jobs get flipped enabled: false with a timestamped backup (jobs.json.bak.pre-retire-20260419). The OpenClaw directory sits untouched for 30 days with a calendar reminder to delete it. If ClaudeClaw had cratered on day two, I was one shell command away from being back on the old system in under a minute.</p><p><strong>Silent success is worse than loud failure.</strong> The design principle I pulled from this whole experience: every job in the system needs someone whose <em>job it is to doubt that job ran correctly.</em> That&#8217;s WATCHMAN. That&#8217;s the external healthcheck. That&#8217;s probe #6 specifically scanning success logs for crash text. If your system can tell you &#8220;everything&#8217;s green&#8221; without that green being adversarially checked, the green doesn&#8217;t mean anything.</p><p><strong>Themed agents beat generic workers.</strong> This one I didn&#8217;t expect. Giving each workstream a named agent with its own CLAUDE.md persona made the system more <em>debuggable</em>, not less&#8212;because now when STEWARD&#8217;s morning briefing has weird tone issues, I know exactly which file to edit, and I&#8217;m not risking regressions in seven other jobs that would have shared a single &#8220;universal assistant&#8221; prompt. The theme is cosmetic. The isolation is load-bearing.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p><strong>The Claude Agent SDK is the right abstraction for this.</strong> I spent a while trying to decide whether to keep hacking on OpenClaw, fork it, or start over. Starting over was the right call specifically because the Agent SDK handles the parts I was getting wrong: sub-agent dispatch, MCP tool wiring, system-prompt composition, retry on transient errors. I wrote the parts that are <em>mine</em> (the scheduler, the memory stack, the Telegram layer, the agent router) and let the SDK own the parts that are undifferentiated heavy lifting.</p><p><strong>What I gave up.</strong> Ollama. Local models. Full offline operation. ClaudeClaw talks to Anthropic&#8217;s API, and that&#8217;s a real philosophical loss versus the local-first thing I was doing with OpenClaw. I thought about this a lot. The honest answer is that Claude Opus is enough better at long-context agentic work than anything I could run locally that the tradeoff pays for itself. I still own my data&#8212;every memory, every document, every log is on my SSD. I just don&#8217;t own the weights. For this phase, that&#8217;s the right trade.</p><p><strong>What I kept.</strong> The philosophy. Every document is a file I can grep. Every config is version-controlled. Every decision has a session note I can link to in a future article. The system is mine to read, mine to modify, mine to understand. The whole reason I left Notion is still the whole reason I left Notion.</p><h2>Quick Reference</h2><p><strong>The migration, by the numbers:</strong> - <strong>5 days</strong>&#8212;start of retirement to all 13 agents live (2026-04-19 &#8594; 2026-04-21) - <strong>30+ PRs</strong>&#8212;one atomic change per commit, conventional-commit format - <strong>38 cron jobs</strong> disabled, <strong>23 LaunchAgents</strong> quarantined - <strong>13 agents</strong> onboarded, <strong>7 Watchman probes</strong> live, <strong>14 scheduled tasks</strong> dispatched via agentId - <strong>30-day</strong> rollback window still open</p><p><strong>The retired vs. the replacement:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QDHQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QDHQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 424w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 848w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png" width="1200" height="1500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/900d6593-6f75-4dba-8cd1-2793fced5589_1200x1500.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100062,&quot;alt&quot;:&quot;A vertical comparison matrix titled \&quot;Retired vs. Replacement\&quot; with the subtitle \&quot;Seven dimensions where the new system pays for itself\&quot; in orange. The matrix has three columns: a narrow left column for the dimension label and a small numbered orange dot, a middle column headed \&quot;OPENCLAW &#183; retired\&quot; in muted rust, and a right column headed \&quot;CLAUDECLAW &#183; Mission Control &#183; live\&quot; in orange. Seven rows each sit on a deep-blue rounded card. Row 01, RUNTIME SURFACE: OpenClaw shows \&quot;38 cron jobs / 23 LaunchAgents\&quot;; ClaudeClaw shows \&quot;1 daemon &#183; 1 healthcheck LaunchAgent / 14 DB-driven scheduled tasks\&quot;. Row 02, AGENT DISPATCH: \&quot;com.openclaw.gateway / subprocess shim\&quot; vs \&quot;Claude Agent SDK / direct invocation\&quot;. Row 03, MEMORY MODEL: \&quot;HEARTBEAT.md &#183; LEARNINGS.md / flat append-only files\&quot; vs \&quot;Memory v2 / 5-layer semantic recall stack\&quot;. Row 04, OBSERVABILITY: \&quot;nerve-health-monitor / cron-quality-monitor\&quot; vs \&quot;WATCHMAN &#183; 7 probes / + external healthcheck\&quot;. Row 05, MODEL INFERENCE: \&quot;Ollama / local LLMs on Mac Studio\&quot; vs \&quot;Claude Opus 4.7 / via Anthropic API\&quot;. Row 06, PROMPT ARCHITECTURE: \&quot;Single 'everything' / system prompt\&quot; vs \&quot;13 themed agents / isolated CLAUDE.md per agent\&quot;. Row 07, DELIVERY: \&quot;Discord webhooks / per source\&quot; vs \&quot;One Telegram bot / forum topics per agent\&quot;. The OpenClaw cells render in muted gray, the ClaudeClaw cells in white, all in a monospace typeface. The footer reads \&quot;5 days &#183; 30+ PRs &#183; zero blind spots survived\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196179846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900d6593-6f75-4dba-8cd1-2793fced5589_1200x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A vertical comparison matrix titled &quot;Retired vs. Replacement&quot; with the subtitle &quot;Seven dimensions where the new system pays for itself&quot; in orange. The matrix has three columns: a narrow left column for the dimension label and a small numbered orange dot, a middle column headed &quot;OPENCLAW &#183; retired&quot; in muted rust, and a right column headed &quot;CLAUDECLAW &#183; Mission Control &#183; live&quot; in orange. Seven rows each sit on a deep-blue rounded card. Row 01, RUNTIME SURFACE: OpenClaw shows &quot;38 cron jobs / 23 LaunchAgents&quot;; ClaudeClaw shows &quot;1 daemon &#183; 1 healthcheck LaunchAgent / 14 DB-driven scheduled tasks&quot;. Row 02, AGENT DISPATCH: &quot;com.openclaw.gateway / subprocess shim&quot; vs &quot;Claude Agent SDK / direct invocation&quot;. Row 03, MEMORY MODEL: &quot;HEARTBEAT.md &#183; LEARNINGS.md / flat append-only files&quot; vs &quot;Memory v2 / 5-layer semantic recall stack&quot;. Row 04, OBSERVABILITY: &quot;nerve-health-monitor / cron-quality-monitor&quot; vs &quot;WATCHMAN &#183; 7 probes / + external healthcheck&quot;. Row 05, MODEL INFERENCE: &quot;Ollama / local LLMs on Mac Studio&quot; vs &quot;Claude Opus 4.7 / via Anthropic API&quot;. Row 06, PROMPT ARCHITECTURE: &quot;Single 'everything' / system prompt&quot; vs &quot;13 themed agents / isolated CLAUDE.md per agent&quot;. Row 07, DELIVERY: &quot;Discord webhooks / per source&quot; vs &quot;One Telegram bot / forum topics per agent&quot;. The OpenClaw cells render in muted gray, the ClaudeClaw cells in white, all in a monospace typeface. The footer reads &quot;5 days &#183; 30+ PRs &#183; zero blind spots survived&quot; with the As The Geek Learns brand mark in the bottom-right." title="A vertical comparison matrix titled &quot;Retired vs. Replacement&quot; with the subtitle &quot;Seven dimensions where the new system pays for itself&quot; in orange. The matrix has three columns: a narrow left column for the dimension label and a small numbered orange dot, a middle column headed &quot;OPENCLAW &#183; retired&quot; in muted rust, and a right column headed &quot;CLAUDECLAW &#183; Mission Control &#183; live&quot; in orange. Seven rows each sit on a deep-blue rounded card. Row 01, RUNTIME SURFACE: OpenClaw shows &quot;38 cron jobs / 23 LaunchAgents&quot;; ClaudeClaw shows &quot;1 daemon &#183; 1 healthcheck LaunchAgent / 14 DB-driven scheduled tasks&quot;. Row 02, AGENT DISPATCH: &quot;com.openclaw.gateway / subprocess shim&quot; vs &quot;Claude Agent SDK / direct invocation&quot;. Row 03, MEMORY MODEL: &quot;HEARTBEAT.md &#183; LEARNINGS.md / flat append-only files&quot; vs &quot;Memory v2 / 5-layer semantic recall stack&quot;. Row 04, OBSERVABILITY: &quot;nerve-health-monitor / cron-quality-monitor&quot; vs &quot;WATCHMAN &#183; 7 probes / + external healthcheck&quot;. Row 05, MODEL INFERENCE: &quot;Ollama / local LLMs on Mac Studio&quot; vs &quot;Claude Opus 4.7 / via Anthropic API&quot;. Row 06, PROMPT ARCHITECTURE: &quot;Single 'everything' / system prompt&quot; vs &quot;13 themed agents / isolated CLAUDE.md per agent&quot;. Row 07, DELIVERY: &quot;Discord webhooks / per source&quot; vs &quot;One Telegram bot / forum topics per agent&quot;. The OpenClaw cells render in muted gray, the ClaudeClaw cells in white, all in a monospace typeface. The footer reads &quot;5 days &#183; 30+ PRs &#183; zero blind spots survived&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!QDHQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 424w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 848w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Retired vs. Replacement</figcaption></figure></div><p></p><p><em>Seven dimensions where the new system pays for itself&#8212;from runtime surface to routing to the memory model.</em></p><p><strong>The rule I wrote for myself:</strong> <em>No job ships without an external watcher that shares no fate with it.</em> That&#8217;s the whole story. Two months of OpenClaw and 48 hours of cascading invisible failures reduced to one sentence I&#8217;ll never forget.</p><p>I&#8217;ll keep writing the ClaudeClaw build-out week by week&#8212;the Council orchestration pattern, the Curator autonomous publishing workflow, the voice-mode bridge, the stuff that&#8217;s too long for one article. If you want the view from inside while it&#8217;s happening, that&#8217;s what this is.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control/comments"><span>Leave a comment</span></a></p><div><hr></div><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.substack.com">As The Geek Learns</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Nightshift: I Went to Sleep and My Mac Ran 118 Experiments]]></title><description><![CDATA[What I learned about disciplined iteration from Karpathy's autoresearch loop running overnight on an M3 Ultra.]]></description><link>https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch</link><guid isPermaLink="false">https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Wed, 22 Apr 2026 19:00:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QI8z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I went to sleep. My Mac ran 118 experiments. When I woke up, a small GPT had trained itself from `val_bpb` 1.563 down to 1.289, beating every documented Apple Silicon overnight run in the project's public README. I wrote no code overnight. I just left a Claude Code session running against a markdown file named `program.md`, and the agent did the rest.</p><p>This is the first morning I've ever genuinely understood why people talk about AI agents with something other than skepticism.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QI8z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QI8z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QI8z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36781,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QI8z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What autoresearch is</h2><p>The idea, which is Karpathy's not mine, goes like this. You give an AI agent a real-but-small LLM training setup. One Python file (`train.py`) contains the model, optimizer, and training loop. A second file (`prepare.py`) contains the data pipeline and evaluation, and the agent isn't allowed to touch it. A third file (`program.md`) is a plain markdown document telling the agent what the experiment rules are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The agent edits `train.py`, runs a training experiment with a fixed 5-minute wall-clock budget, checks `val_bpb` (validation bits per byte, a loss metric where lower is better), and either keeps the change with a git commit or does `git reset --hard` and tries something else. Then it does it again. And again. Indefinitely, until you stop it.</p><p><a href="https://github.com/karpathy/autoresearch">Karpathy's original repo</a> is NVIDIA and CUDA only. A developer named <a href="https://github.com/trevin-creator/autoresearch-mlx">trevin-creator</a> ported it to Apple Silicon using MLX, no PyTorch required. It runs natively on the M-series chips, eating unified memory instead of GPU VRAM. Which is why I could run it on a Mac Studio sitting on my desk.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H3hB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H3hB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 424w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 848w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1272w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H3hB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png" width="1028" height="321" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:321,&quot;width&quot;:1028,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32192,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H3hB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 424w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 848w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1272w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Setup and the surprise baseline</h2><p>Install took about three minutes. `uv sync` pulled MLX and six other small dependencies. `uv run prepare.py` downloaded eleven training shards from the public HuggingFace dataset and trained a BPE tokenizer in 41 seconds.</p><p>Then I did one manual run, as the setup instructions said to: a single 5-minute training experiment to establish a hardware baseline, no modifications.</p><p>The first surprise: `val_bpb 1.563`. The public README documents a manual walk on older Apple Silicon that bottomed out at `1.807` after four experiments. My first run, before the AI agent had done anything, was already 13% better than that published best. I didn't tune anything. I pulled the repo and ran it.</p><p>The reason is in how the loop is constructed. The training budget is fixed at 5 minutes of wall clock. The M3 Ultra throughput is high enough that it fits 555 optimizer steps into that window, while the older hardware fits fewer. Same code. Different step count. Different result.</p><p><strong>The hardware is a parameter, not a constant.</strong></p><blockquote><p>Specs for replication</p><p>- Hardware: Mac Studio M3 Ultra, 128 GB unified memory</p><p>- OS and runtime: macOS 15, Python 3.12, `uv` 0.10</p><p>- Framework: MLX 0.31 with Metal backend (no PyTorch, no CUDA)</p><p>- Agent runner: Claude Code (Anthropic)</p><p>- Fork used: `github.com/trevin-creator/autoresearch-mlx`</p><p>- Per-experiment budget: 5 minutes training, ~90 seconds compile and eval overhead</p><p>- Peak unified memory during training: 21.2 GB</p></blockquote><h2>Launching the agent overnight</h2><p>Here's where you have to decide. Karpathy's default advice is to "disable all permissions" and let the agent go. That's the fastest path and it works. But it's also a permission-free Claude Code session running unattended on your Mac for eight hours, with the ability to execute arbitrary shell commands. If the agent hallucinates a destructive action at 3 AM, you won't be there to interrupt it.</p><p>I went with a scoped allowlist instead. A `.claude/settings.local.json` file listing exactly the commands the loop actually needs: `uv run train.py`, `git add train.py`, `git commit`, `git reset --hard`, `grep`, `tail`, a few others. Everything else prompts. The agent can't `rm`, can't `git push`, can't install packages, can't touch any file outside the repo.</p><p>Then I pointed a fresh Claude Code session at `program.md`, pasted "start the experimentation loop, don't stop," and went to bed.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><h2>The morning, by the numbers</h2><p>The morning log:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xMX3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xMX3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xMX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png" width="1223" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1223,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79702,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xMX3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Comparison to the three overnight runs documented in the public README:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-4t_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-4t_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-4t_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png" width="1223" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1223,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-4t_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Final `val_bpb` of 1.289 lands below the best documented Apple Silicon overnight result. New territory for the public log.</p><h2>What the agent actually did</h2><p>Five phases overnight. Each tells you something.</p><p><strong>Phase one: find the big axis.</strong> Four experiments in, the agent had halved the batch size three times (1.56, 1.40, 1.39, 1.38), then tried a fourth halving that bounced back to 1.44. The annotation on the discard: <strong>"gradient noise."</strong> Correct diagnosis. Below a threshold, batch becomes too small for the optimizer to converge inside 5 minutes.</p><p><strong>Phase two: schedule tuning, six keeps in a row.</strong> The learning-rate schedule was undertuned. The agent walked `WARMDOWN_RATIO` from 0.7 to 1.0, then `WARMUP_RATIO` from 0.02 to 0.2. Every step dropped `val_bpb`. Floor went from 1.38 to 1.34. Biggest easy win of the night, and it was entirely in the schedule.</p><p><strong>Phase three: the moment that mattered most.</strong> After schedule tuning, the agent retried `TOTAL_BATCH_SIZE = 2^14`. The same configuration it had rejected in phase one. This time it won.</p><p>The agent had discovered the thing most humans miss in hyperparameter tuning: the optimal value of one knob depends on the values of all the other knobs. You don't find N independent settings; you find a consistent N-tuple. The only way to find it is to retry earlier-rejected values after each structural change. I've watched human researchers lock in early wins and never revisit them. The agent didn't. It revisited `EMBEDDING_LR` three times over the night, landing at 1.0, then 1.5, then 1.75 across different phases. Each retry, a small win.</p><p><strong>Phase four: two structural wins, one line each.</strong> `has_ve()` went from alternating-layers-get-Value-Embeddings to all-layers-get-Value-Embeddings, one `return True` replacing a modular-arithmetic expression. `MLP.__call__()` swapped `ReLU&#178;` for `SiLU`, one function call for another. Both character-count-sized changes. Each dropped `val_bpb` by about 0.01.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xKMK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xKMK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xKMK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xKMK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Phase five: the 37-experiment grind.</strong> The agent spent 37 consecutive experiments without a single keep, testing every nearby hyperparameter against the current local minimum. Most humans would have quit and tried a wild leap. The agent didn't. It finished the neighborhood, then found the next structural win. Disciplined exhaustion.</p><p>And two catastrophes, both correctly reverted. Tied embeddings came back at `val_bpb 4.29`, three times worse than anything else. The agent annotated it <strong>"LR mismatch destroys."</strong> Tied embeddings is actually a good idea in general, but incompatible with the differential layer-wise learning rates the architecture uses. The agent reverted in seconds. On another experiment, removing QK-norm after RoPE spiked `val_bpb` to 1.67. Annotation: <strong>"massive regression."</strong> Reverted. A human would have spent an hour trying to salvage tied embeddings. The agent spent ten seconds on the revert. <em>The revert discipline is the whole game.</em></p><h2>What it taught me</h2><p>Two things crystallized overnight.</p><p><strong>Disciplined exhaustion beats creative leaps.</strong> Humans get bored. After a few hours on the same hyperparameter axis, we start reaching for something new because the exploration stops feeling productive. The agent doesn't have that pressure. It spent 37 experiments without a win because that's what the local search called for, and then it found the next jump. Most humans couldn't do that. Not because we lack the ability, but because we lack the emotional neutrality. The agent's advantage isn't intelligence. It's the absence of boredom, ego, and social pressure. That isn't a 20&#215; productivity gap. It's a categorical one.</p><p><strong>Generation is cheap, evaluation is sacred.</strong> Every one of the agent's wins was a one-line diff. So was every catastrophe. The "research" wasn't in writing the code. The research was in the metric's ability to rank one-line diffs instantly and unambiguously. Karpathy's genius isn't the agent. It's `val_bpb` plus a 5-minute budget plus `git reset --hard`. That design slots the agent into exactly what AI is magnitudes better at (generating variants, executing at volume) and leaves the hard part (what to measure) to the human who built the loop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tRNX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tRNX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tRNX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72691,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tRNX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The loop runs on me too</h2><p>Here's the thing I can't stop thinking about. The loop the agent ran overnight is structurally identical to the one I'm building for my Stoic practice on the same machine.</p><p>Morning intention. Five-minute run. Evening review. Keep or discard. Iterate.</p><p>Marcus Aurelius wasn't optimizing `val_bpb`. He was optimizing a harder metric with no closed form. But the shape of the loop is the same. Karpathy designed an overnight research org. Epictetus designed an overnight self. Both are the same thing running in different mediums.</p><p>The 118-experiment loop ran on a machine on my desk. The second loop runs on me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4HZk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4HZk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4HZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4HZk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you have a Mac Studio and a spare evening, the repo is at `github.com/trevin-creator/autoresearch-mlx`. Clone it, run `prepare.py`, point a Claude Code session at `program.md`, go to sleep. You wake up to a log of experiments and a better model. And if you're anything like me, you also wake up thinking about which of your own loops could run this way.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch/comments"><span>Leave a comment</span></a></p><p></p><h3>A Quick AI Glossary For This Article</h3><p>Because not everyone speaks ML fluently, here&#8217;s a plain-English guide to the terms in this post. I&#8217;m still learning too, so these are &#8220;practitioner&#8221; definitions&#8212;enough to follow what&#8217;s happening, not academic deep-dives.</p><h4>The Big Picture</h4><p><strong>GPT.</strong> A type of language model. Stands for &#8220;Generative Pre-trained Transformer.&#8221; In this article I&#8217;m training a tiny one from scratch, not using the big ones like ChatGPT. Same architecture family, just much smaller.</p><p><strong>Pre-training</strong>. The step where a model learns to predict the next word (or &#8220;token&#8221;) across a huge pile of text. This is what `train.py` is doing. It happens before any of the fine-tuning that turns a base model into a chatbot.</p><p><strong>val_bpb (validation bits per byte).</strong> The score the agent is optimizing. Lower is better. It&#8217;s a measure of how surprised the model is by held-out text it hasn&#8217;t seen during training. A model that predicts well has low surprise. Bits per byte is a way of measuring that surprise that works across different tokenizers, so you can compare different architectures fairly.</p><p><strong>Loss metric. </strong>Any number that tells you how wrong a model is on a given task. Training is the process of making that number go down. `val_bpb` is a loss metric.</p><h4>The Stack</h4><p><strong>Apple Silicon.</strong> Apple&#8217;s own CPU/GPU chip family (M1, M2, M3, M4). Uses <strong>unified memory</strong>, which means the CPU and GPU share the same pool of RAM instead of having separate memory pools. For AI workloads this is a big deal because you don&#8217;t have to copy data between CPU RAM and GPU VRAM.</p><p><strong>MLX</strong>. Apple&#8217;s open-source machine learning framework, built specifically for Apple Silicon. Think of it as Apple&#8217;s answer to PyTorch but native to Metal (Apple&#8217;s GPU API). No PyTorch, no CUDA, no NVIDIA drivers needed.</p><p><strong>PyTorch.</strong> The dominant open-source ML framework. Most research code you see online assumes PyTorch. It runs on NVIDIA GPUs (via CUDA) and, with caveats, on Apple GPUs (via MPS). MLX is an alternative that sidesteps PyTorch entirely.</p><p><strong>CUDA.</strong> NVIDIA&#8217;s API for running general-purpose compute on their GPUs. If you&#8217;ve ever seen a blog post say &#8220;requires a CUDA-capable GPU,&#8221; they mean an NVIDIA card.</p><p><strong>GPU VRAM.</strong> The memory that lives on a GPU card, is separate from your computer&#8217;s main RAM. On Apple Silicon, VRAM and main RAM are the same pool (that&#8217;s the &#8220;unified memory&#8221; thing).</p><h4>Tokenization &amp; Data</h4><p><strong>Tokenizer.</strong> The thing that turns text into numbers the model can actually work with. &#8220;Hello world&#8221; might become `[15496, 995]`. The model only ever sees the numbers.</p><p><strong>BPE (Byte-Pair Encoding).</strong> The most common algorithm for building a tokenizer. It starts with individual characters and iteratively merges the most common pairs until you have a vocabulary of &#8220;tokens&#8221; that balance common words (one token) and rare words (split into pieces).</p><p><strong>Shards. </strong>Chunks of a large dataset, split into files for parallel download and loading. Our setup uses 11 shards from a public text dataset.</p><h4>Training Mechanics</h4><p><strong>Optimizer.</strong> The algorithm that actually updates the model&#8217;s weights during training. AdamW is the one used here. Every &#8220;optimizer step&#8221; is one update.</p><p><strong>Batch size.</strong> How many training examples the model looks at before making one weight update. Bigger batches give smoother gradient estimates but use more memory. Smaller batches fit more weight updates into a fixed time budget.</p><p><strong>Gradient accumulation.</strong> A trick for getting large effective batch sizes on limited hardware. Process smaller mini-batches sequentially, add up their gradients, then apply one update. `TOTAL_BATCH_SIZE / DEVICE_BATCH_SIZE` tells you how many mini-batches per update.</p><p><strong>Gradient noise.</strong> When your batch is so small that the gradient estimate becomes statistically unreliable. The optimizer starts jerking around instead of smoothly descending, and training slows or stalls. The agent correctly identified this as the failure mode at batch 2^12.</p><p><strong>Learning rate (LR).</strong> How big a step the optimizer takes each update. Too high, and training blows up. Too low, and it barely progresses. The sweet spot depends on everything else.</p><p><strong>Learning rate schedule.</strong> How the learning rate changes over time. Typically: warm up from zero to peak, cruise, then warm down to zero. `WARMUP_RATIO = 0.3` means the first 30% of training is the warm-up.</p><p><strong>Differential / layer-wise learning rates.</strong> Using different learning rates for different parts of the model. In the nightshift setup, the embedding layer gets LR 1.75, but the output projection (`lm_head`) gets 0.006 &#8212; a 290&#215; difference. This matters because different parameter types have very different sensitivities.</p><h4>Architecture Pieces</h4><p><strong>Attention (or attention layer)</strong>. The core mechanism that lets a transformer model &#8220;pay attention to&#8221; relevant earlier tokens when predicting the next one. Modern LLMs are mostly stacks of attention layers alternating with MLPs.</p><p><strong>MLP (multi-layer perceptron).</strong> A simple feed-forward neural network with one or two hidden layers. In a transformer, an MLP sits between each pair of attention layers and does the &#8220;thinking&#8221; on the representations attention produced.</p><p><strong>Activation function.</strong> A nonlinear function applied inside a neural net. Without activations, no matter how many layers you stack, the whole thing collapses mathematically into one linear transformation. Examples in this article: `ReLU&#178;` and `SiLU`.</p><p><strong>SiLU (Sigmoid Linear Unit).</strong> `x * sigmoid(x)`. A smooth, differentiable activation function. Also called Swish. Used in many modern models because it plays nicely with optimizers.</p><p><strong>ReLU&#178; (squared ReLU).</strong> `max(x, 0) ** 2`. The piece that nanoGPT-speedrun and some research codebases use. Produces sparse, squared activations. Theoretically expressive but less numerically stable than SiLU for short training runs &#8212; which is why SiLU won overnight.</p><p><strong>Embedding.</strong> The lookup table that converts each input token (a number) into a vector of real numbers. The model learns what each vector should be during training. `wte` = word token embedding.</p><p><strong>Value Embeddings (VE).</strong> An additional set of embeddings injected into attention layers as the &#8220;value&#8221; vectors. Think of them as a skip connection from the raw input that every attention layer can consult, on top of what the previous layer produced. Helps information flow when the network is deep.</p><p><strong>Tied embeddings.</strong> Sharing the input embedding weights with the output projection weights (the thing that produces final logits). Saves millions of parameters. Commonly used in GPT-2 and many others. Broke catastrophically in our run because the differential learning rate setup couldn&#8217;t handle the shared weight.</p><p><strong>QK-norm (Query-Key normalization).</strong> A stabilization trick: normalize the query and key vectors inside attention before computing attention scores. Without it, score magnitudes can spike, saturating the softmax. The agent tried removing QK-norm and `val_bpb` jumped 28% worse.</p><p><strong>RoPE (Rotary Position Embedding).</strong> How the model knows the order of tokens. Rotates the query and key vectors by an angle that depends on the token&#8217;s position. Standard in modern transformers.</p><p><strong>Softmax.</strong> The function that turns raw attention scores into a probability distribution over the tokens you might attend to. Highly peaked inputs cause &#8220;softmax saturation&#8221; &#8212; most of the weight collapses onto one token and gradients downstream get weak. That&#8217;s why QK-norm matters.</p><h4>Methodology</h4><p><strong>Hyperparameter.</strong> Any configuration value you set *before* training, as opposed to weights the model learns *during* training. Batch size, learning rate, WARMUP_RATIO, depth&#8212;all hyperparameters.</p><p><strong>Hyperparameter tuning.</strong> The art (and mostly the grind) of finding good hyperparameter values. Most of what the agent did overnight was hyperparameter tuning.</p><p><strong>Interaction effect.</strong> When the optimal value of hyperparameter A changes depending on what hyperparameter B is set to. A consistent set of hyperparameters is not N independent optima &#8212; it&#8217;s one N-tuple.</p><p><strong>Local search.</strong> A research strategy: after finding an improvement, test every nearby variation of your current best before venturing somewhere completely different. Tedious for humans. Perfect for agents that don&#8217;t get bored.</p><p><em><strong>If I missed a term you&#8217;d have liked defined, please let me know in the comments and I&#8217;ll add it.</strong></em></p>]]></content:encoded></item><item><title><![CDATA[Hosted RAG vs. Self-Hosted RAG for MCP Servers—When Does Paying Actually Win?]]></title><description><![CDATA[A practical comparison of Cloudflare AI Search, Bedrock Knowledge Bases, Pinecone Assistants, LlamaCloud, and self-hosted sqlite-vec for powering MCP servers. Real pricing, real trade-offs, and when each one makes sense.]]></description><link>https://astgl.com/p/hosted-rag-vs-self-hosted-rag</link><guid isPermaLink="false">https://astgl.com/p/hosted-rag-vs-self-hosted-rag</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Tue, 21 Apr 2026 00:42:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jj99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I shipped <a href="https://astgl.ai/projects#mcp-astgl-knowledge">an MCP knowledge server</a> in a weekend with sqlite-vec and Ollama. It answers questions about my own articles. It runs on a laptop. It costs $0/month.</p><p>Then someone asked the obvious next question: "Can you point it at our Confluence? And Notion? And the Google Drive?"</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Suddenly self-hosted isn't free anymore. It's a part-time job&#8212;PDF parsing, OCR, re-indexing schedules, dealing with 50-page slide decks where the first 20 pages are a title card. The embedding pipeline that was elegant for 20 markdown articles starts to sweat when you throw a 400-page SOC 2 audit at it.</p><p>So here's the question I had to actually answer for myself: <strong>when does paying Cloudflare, AWS, or Pinecone actually beat running your own stack?</strong></p><p>I spent a research pass comparing the live services. Here's what I found.</p><h2>TL;DR</h2><p><strong>Self-host</strong> when content is static, under about a thousand docs, single source, you control ingestion cadence, and privacy or cost-per-query matters more than your time.</p><p><strong>Hosted</strong> when: multiple unstructured sources, frequent re-indexing, non-engineers uploading docs, you need SLAs, or you're shipping this to customers.</p><p><strong>Hybrid</strong> is increasingly common: hosted RAG for the customer-facing product, self-hosted for internal dogfooding and dev. The two aren't mutually exclusive.</p><p></p><h2>The Contenders</h2><p>Five options worth your attention. One paragraph each.</p><h3>Cloudflare AI Search (AutoRAG)</h3><p>The newest entrant, currently in open beta. Cloudflare stitched together R2 for storage, Vectorize for embeddings, and Workers AI for inference, then wrapped the whole thing in a management API. Strongest pitch: near-zero config, pay-as-you-go, and an <a href="https://github.com/cloudflare/mcp-server-cloudflare">official MCP server</a> ships with it. Weakest point: retrieval is vector-first. Cloudflare added optional reranking in October 2025, but there's still no published BM25 or hybrid-search path as of this writing. If your corpus is well-structured, you probably won't notice. If you're indexing messy enterprise content, you will.</p><h3>AWS Bedrock Knowledge Bases</h3><p>The enterprise default if you're already on AWS. Hybrid search (vector + BM25) is built in, Cohere reranking is available, and chunking modes range from fixed-size to semantic to custom Lambda. Titan V2 embeddings run at $0.02 per million tokens. There's an official <a href="https://awslabs.github.io/mcp/servers/bedrock-kb-retrieval-mcp-server">AWS Labs MCP server</a> for retrieval. And then there's the OCU landmine&#8212;which I'll get to in a minute, because it deserves its own sidebar.</p><h3>Pinecone Assistants</h3><p>Best-in-class retrieval, managed. Hybrid sparse-dense search with automatic reranking, configurable alpha weighting, managed embeddings abstracted away from you, and an official remote MCP server. Pricing is fully usage-based&#8212;$5 per million context retrieval tokens, plus input/output token, storage, and ingestion charges on top. The Standard plan has a $50/month minimum; the old $0.05/assistant-hour fee was removed. Free tier is real but tight&#8212;5 assistants per project, 1 GB storage, 500k input tokens, and 500k context retrieval tokens per month. Past that you're paying, but the retrieval quality is noticeably better than anything else on this list.</p><h3>LlamaCloud</h3><p>Managed LlamaIndex. Multimodal parsing that actually handles diagrams, configurable chunking modes, hybrid retrieval, reranking. The free tier gives you 10,000 credits a month&#8212;about a thousand pages. Paid tiers start at $50/month (Starter, 40K credits) and scale to $500/month (Pro, 400K credits). For a LlamaIndex-native team, the Starter tier is genuinely cheap; Pro is where the platform pays off. LlamaIndex ships `run-llama/llamacloud-mcp` (Python) and `run-llama/mcp-server-llamacloud` (TypeScript), plus a hosted gat<code>way at mcp.llamaindex.ai</code>the MCP story is actually stronger here than I initially realized.</p><h3>Self-Hosted (sqlite-vec + Ollama)</h3><p>This is what <a href="https://astgl.ai/projects#mcp-astgl-knowledge">the ASTGL Knowledge MCP server</a> actually runs on. sqlite-vec for vectors, FTS5 for keyword search (that's your hybrid search right there, no cloud required), and Ollama serving nomic-embed-text for embeddings, all of it on a $10/month Hetzner VPS or a Mac mini on my desk. Works well for up to around a million vectors in my testing. Real cost: infrastructure plus your time. The second one is the variable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jj99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jj99!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 424w, https://substackcdn.com/image/fetch/$s_!jj99!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 848w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1272w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jj99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png" width="895" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:895,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:496636,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194365100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jj99!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 424w, https://substackcdn.com/image/fetch/$s_!jj99!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 848w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1272w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>The Six Axes That Actually Matter</h2><p>Pricing gets the attention, but it&#8217;s rarely the deciding factor. Here&#8217;s what I look at:</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dpe0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dpe0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 424w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 848w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1272w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png" width="1207" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1207,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133778,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194365100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dpe0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 424w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 848w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1272w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/hosted-rag-vs-self-hosted-rag?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/hosted-rag-vs-self-hosted-rag?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h3>Setup cost</h3><p>Time-to-first-query is where hosted services actually earn their money. Pinecone Assistants and Cloudflare AI Search will have you chatting with your docs in under a minute after signup&#8212;upload and go. Bedrock is the outlier on the hosted side: AWS documentation puts CloudFormation infrastructure deployment at 7&#8211;10 minutes, with a full hand-wired setup typically landing at 20&#8211;30 minutes. That's hosted pricing with self-hosted-ish friction.</p><p>Self-hosted with sqlite-vec and Ollama is about 30 minutes from `apt install` to first working query if you know what you're doing, longer if you're learning. For me it's fast because I've done it. For someone new to local LLMs it's a weekend.</p><h3>Ongoing cost</h3><p>This is where the story flips. For a small corpus with low query volume&#8212;think a few hundred docs and a few thousand queries a month&#8212;Cloudflare AI Search is genuinely cheap, maybe $5&#8211;15/month in storage and API costs. Pinecone Assistants sits at $20&#8211;50 in that range. Bedrock KB looks innocent until you hit the OCU minimum (more on that below). LlamaCloud's $50/month Starter floor is reasonable; the $500/month Pro tier is where the platform pays off at real scale.</p><p>Self-hosted is $10/month for a Hetzner VPS, flat. Mac mini on your desk? $0/month plus electricity. The per-query cost of hosted RAG is the thing that compounds when you scale&#8212;or when someone builds something that hammers it.</p><h3>Ingestion complexity</h3><p>This is the axis where hosted services earn their keep without argument. Bedrock KB and LlamaCloud both handle PDFs with embedded tables, Word docs, and (in LlamaCloud's case) actual diagrams, not just the text around them. Bedrock's Data Automation service charges $0.010 per page for parsing&#8212;not free, but a lot cheaper than writing your own PDF extractor.</p><p>Self-hosted with Ollama and sqlite-vec doesn't ship with any of that. If your corpus is markdown, you're fine. If it's a pile of PDFs from your legal team, you're either writing parsers or paying someone to.</p><h3>Retrieval quality</h3><p>All four hosted services offer hybrid retrieval except Cloudflare AI Search, which is vector-only as of this writing. Pinecone Assistants has automatic reranking baked in. Bedrock KB has optional Cohere reranking. Self-hosted with sqlite-vec can do hybrid via FTS5 for keyword matching combined with vector similarity, which is genuinely good&#8212;but you're the one writing the ranking logic.</p><p>For most queries on well-structured content, vector-only is fine. For ambiguous queries over messy content, reranking earns its cost.</p><h3>Data residency</h3><p>Self-hosted wins this one by default. The data never leaves your machine.</p><p>On the hosted side: Pinecone has US and EU regions with a DPA, and LlamaCloud has SOC 2 Type II and HIPAA. Bedrock's EU region support has been inconsistent in 2026 documentation&#8212;verify before you commit. Cloudflare's Data Localization Suite handles this at the platform level.</p><p>If you're in a regulated industry, audit the provider before you pick. Don't trust the marketing page.</p><h3>Ops burden</h3><p>This is the one nobody advertises. Self-hosted means you're responsible for:</p><ul><li><p>Keeping Ollama updated</p></li><li><p>Monitoring embedding drift when you upgrade models</p></li><li><p>Backing up knowledge.db</p></li><li><p>Scheduling re-indexing when source content changes</p></li><li><p>Debugging why sqlite-vec suddenly returns zero results (hint: usually the embedding model changed dimensions)</p></li></ul><p>Hosted services handle all of that. That's most of what you're paying for.</p><p></p><h2>Sidebar: The Bedrock OCU Landmine</h2><p>Bedrock Knowledge Bases advertises "no charge for the Knowledge Bases feature itself." Technically true. What they don't mention on the pricing page is that the vector storage layer requires a minimum of 2 OCUs&#8212;OpenSearch Compute Units&#8212;at roughly $0.24/hour each.</p><p>Do the math: 2 OCUs &#215; $0.24/hour &#215; 730 hours/month = <strong>about $350 per month</strong> whether your knowledge base has 10 documents or 10 million.</p><p>Nobody else on this list has a fixed cost floor like that. Cloudflare AI Search scales down to pennies. Pinecone Assistants has a real free tier. Self-hosted is $10.</p><p>If you're building something small and you're not already deep in AWS&#8212;Bedrock KB is the wrong answer. If you're running enterprise-scale search over millions of docs, that $350 becomes a rounding error, and the hybrid+rerank features earn their keep.</p><p>Know where you sit before you commit.</p><h2>The MCP Angle</h2><p>Here's the thing I didn't expect to find: <strong>every production RAG service on this list ships an official MCP server.</strong> Cloudflare, Bedrock, Pinecone, LlamaCloud&#8212;all of them. This went from "experimental" to "table stakes" over the past year.</p><ul><li><p>Cloudflare AI Search &#8594; The official Cloudflare MCP server exposes AI Search endpoints</p></li><li><p>Bedrock KB &#8594; AWS Labs ships `bedrock-kb-retrieval-mcp-server`</p></li><li><p>Pinecone Assistants &#8594; Each assistant gets its own remote MCP endpoint, plus a local Docker option</p></li><li><p>LlamaCloud &#8594; `run-llama/llamacloud-mcp` plus the hosted MCP Gateway at mcp.llamaindex.ai</p></li></ul><p>This wasn't true a year ago. The MCP ecosystem has absorbed the big RAG providers fast enough that "hosted RAG you can query from Claude Desktop" is now a checkbox feature.</p><p>Self-hosted doesn't ship with an MCP server&#8212;but wrapping one around your sqlite-vec database is a weekend of TypeScript. That's what <a href="https://astgl.ai/projects#mcp-astgl-knowledge">the ASTGL Knowledge MCP server</a> actually is: an MCP wrapper around vector search and Q&amp;A retrieval over a SQLite database. The MCP part is trivial. The content curation and ingestion pipeline is 90% of the work.</p><p>The real insight: <strong>hosted RAG plus MCP wrapper is the modern middle path.</strong> You don't have to pick pure self-hosted or pure managed. Point a custom MCP server at Pinecone Assistants or Bedrock KB, and you get the retrieval quality of managed services with the MCP-native interface your agents expect. The `cloudflare/ai-search` MCP server does exactly this.</p><p>That changes the decision. It's not "hosted vs. self-hosted RAG" anymore. It's "Whose retrieval layer do I want behind my MCP server?"</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DUoA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DUoA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 424w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 848w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1272w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DUoA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png" width="1215" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:1215,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139869,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194365100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DUoA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 424w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 848w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1272w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Decision Framework</h2><p>Enough philosophy. Here's the checklist I use.</p><p>1. <strong>Is your corpus under 500 docs and mostly static?</strong> Self-host. You'll spend more time reading hosted RAG docs than it would take to `npm install sqlite-vec`.</p><p>2. <strong>Do you have under 20 hours to ship this?</strong> Hosted. Pinecone Assistants or Cloudflare AI Search will get you to a demo faster than you can read the Bedrock IAM setup guide.</p><p>3. <strong>Are you charging money for this?</strong> Either hosted (you need the SLA) or self-hosted with a real infra budget and a pager rotation. Don't split the difference on production.</p><p>4. <strong>Is any of this data regulated&#8212;PHI, PII under GDPR, or financial?</strong> Self-host, or audit the hosted provider's compliance posture before you upload anything. Don't trust the marketing page. Ask for the SOC 2 report.</p><p>5. <strong>Are you already in AWS?</strong> Bedrock KB makes sense <em>if</em> your scale justifies the OCU floor. Otherwise, Pinecone.</p><p>6. <strong>Everything else?</strong> Prototype self-hosted with sqlite-vec. Migrate to hosted when a specific pain point forces the move. "We keep hitting embedding model drift" is a real reason. "It seems complicated" isn't.</p><p>The rule of thumb I use: <strong>pay for what hurts, self-host what you enjoy.</strong> If PDF parsing makes you want to quit, pay Bedrock or LlamaCloud. If SQL and vector search are fun, keep sqlite-vec.</p><p></p><h2>What I'd Actually Build in 2026</h2><p>If you asked me right now, for real scenarios:</p><p><strong>Weekend side project.</strong> sqlite-vec plus Ollama plus nomic-embed-text. Runs on a laptop, costs nothing, and teaches you how RAG actually works. This is where I'd start every time.</p><p><strong>Customer-facing SaaS feature.</strong> Cloudflare AI Search. Pay-per-query pricing means your costs track your usage. Official MCP server means Claude Desktop users can plug in directly. The open-beta caveat is real&#8212;verify the SLA matches your product's uptime needs before launch.</p><p><strong>Enterprise RAG over thousands of internal docs.</strong> Bedrock Knowledge Bases if you're already in AWS and you'll comfortably exceed the OCU floor. Pinecone Assistants if you're not. LlamaCloud if your team is already deep in LlamaIndex and the multimodal parsing earns its cost. All three have hybrid search; all three ship MCP servers. Pick based on where your infrastructure&#8212;and your team's existing expertise&#8212;already lives.</p><p><strong>Team knowledge base.</strong> Self-hosted if it's under five people and you've got one engineer who cares about it. Hosted the moment it crosses twenty users or someone non-technical needs to upload docs. The threshold isn't the document count&#8212;it's the human factor.</p><p>The sqlite-vec era isn't ending. It's just not the only answer anymore. A year ago, self-hosted was the serious choice, and hosted was for people who didn't want to learn. In 2026, that framing doesn't hold. Hosted RAG is production-ready, MCP-native, and sometimes cheaper than your own ops time.</p><p>Pick the tool that matches the job. That's it.</p><h2>FAQ</h2><h3>What is Cloudflare AI Search?</h3><p>Cloudflare AI Search (formerly AutoRAG) is a managed Retrieval-Augmented Generation service built on Cloudflare's platform. It combines R2 storage, Vectorize for embeddings, and Workers AI for inference into a single API. It's currently in open beta with vector-first retrieval and optional reranking, and ships with an official MCP server that lets Claude and other AI assistants query your indexed documents directly.</p><h3>When should I use hosted RAG instead of sqlite-vec for an MCP server?</h3><p>Use hosted RAG when your corpus exceeds a few thousand documents, you're ingesting multiple source types like PDFs or Word docs, non-engineers need to upload content, or you need a production SLA. Stick with sqlite-vec when the corpus is static markdown under about 1,000 documents, you control ingestion, and cost-per-query matters more than ops time.</p><h3>Can I use the Cloudflare AI Search MCP with Claude Desktop?</h3><p>Yes. Cloudflare ships an official MCP server that exposes AI Search endpoints as MCP tools. Add the Cloudflare MCP server to your Claude Desktop config, provide your API token, and Claude can query your indexed documents through the same interface it uses for any other MCP tool. The setup is documented in the Cloudflare MCP repository.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/hosted-rag-vs-self-hosted-rag/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/hosted-rag-vs-self-hosted-rag/comments"><span>Leave a comment</span></a></p><p></p><ul><li><p>Related reading:*</p></li><li><p><a href="https://astgl.com/p/shipping-mcp-knowledge-server-weekend">How I Shipped an MCP Knowledge Server in a Weekend</a>: the self-hosted case study this article references</p></li><li><p><a href="https://astgl.ai/answers/how-mcp-registries-work">How Do MCP Registries Work (Smithery, mcpt)?</a>: finding MCP servers, including the ones in this article</p></li><li><p><a href="https://astgl.com/p/cortex-event-sourced-memory-ai-coding-assistants">Cortex: An Event-Sourced Memory Architecture for AI Coding Assistants</a>: related exploration of the memory/retrieval landscape</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>