<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[As The Geek Learns]]></title><description><![CDATA[Tools and training for IT professionals. PowerCLI courses, productivity apps, and 25 years of lessons learned.]]></description><link>https://astgl.com</link><image><url>https://substackcdn.com/image/fetch/$s_!hfS3!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png</url><title>As The Geek Learns</title><link>https://astgl.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 08 Jun 2026 17:12:14 GMT</lastBuildDate><atom:link href="https://astgl.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[James Cruce]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[astgl@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[astgl@substack.com]]></itunes:email><itunes:name><![CDATA[James Cruce]]></itunes:name></itunes:owner><itunes:author><![CDATA[James Cruce]]></itunes:author><googleplay:owner><![CDATA[astgl@substack.com]]></googleplay:owner><googleplay:email><![CDATA[astgl@substack.com]]></googleplay:email><googleplay:author><![CDATA[James Cruce]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[I Secured My AI Agent With a 7-Layer Threat Model]]></title><description><![CDATA[Using the MAESTRO framework to harden an autonomous agent -- seven layers of things that can go wrong, translated from security-paper-speak in your day.]]></description><link>https://astgl.com/p/secured-ai-agent-7-layer-threat-model-podcast-episode-014</link><guid isPermaLink="false">https://astgl.com/p/secured-ai-agent-7-layer-threat-model-podcast-episode-014</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 08 Jun 2026 17:00:31 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/201167231/6bbc433131cc6958f3a3e98c9d79399a.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong>I have an autonomous AI agent running on my Mac Studio. It has full shell access, reads my calendar, manages my tasks, and sends iMessages on my behalf. It runs 24/7 as a background service.</strong></p><p>If that sentence doesn&#8217;t make you slightly nervous, you haven&#8217;t been paying attention. In <a href="https://www.isec.news/2026/02/10/securityscorecard-135000-plus-internet-exposed-openclaw-instances-found/">February 2026, researchers found over 135,000 OpenClaw instances exposed to the public internet</a>. A coordinated attack called <a href="https://cybersecuritynews.com/clawhavoc-poisoned-openclaws-clawhub/">ClawHavoc</a> planted over a thousand malicious plugins in the community registry. Nine CVEs have been disclosed, including remote code execution.</p><p>I needed to take security seriously. Not &#8220;I changed the default password&#8221; seriously. Threat-model seriously.</p><h2>MAESTRO: Seven Layers of Things That Can Go Wrong</h2><p>The <a href="https://cloudsecurityalliance.org/">Cloud Security Alliance </a>published a framework called <a href="https://github.com/CloudSecurityAlliance/MAESTRO">MAESTRO</a>&#8212;a 7-layer threat model specifically designed for agentic AI systems. Ken Huang mapped it directly to OpenClaw&#8217;s codebase, identifying 35+ specific threats across every layer of the stack.</p><p>Here are the seven layers, translated from security-paper language into &#8220;things that could actually ruin your day&#8221;:</p><p><strong>Layer 1: Foundation Models:</strong> Someone sends your agent a crafted message that hijacks its behavior. Prompt injection. Jailbreaks. System prompt leakage. Your agent does what an attacker tells it to instead of what you told it to.</p><p><strong>Layer 2: Data Operations:</strong> Your credentials are stored in plaintext JSON files. Your session logs contain every conversation forever. A malicious skill injects code through your workspace.</p><p><strong>Layer 3: Agent Frameworks:</strong> The agent misuses its own tools. It runs shell commands it shouldn&#8217;t. It spawns sessions without authorization. It escalates its own privileges.</p><p><strong>Layer 4: Deployment &amp; Infrastructure:</strong> Your gateway is exposed to the network. Someone brute-forces the WebSocket token. A reverse proxy misconfiguration bypasses authentication entirely.</p><p><strong>Layer 5: Evaluation &amp; Observability:</strong> Nobody&#8217;s watching the agent for anomalous behavior. There&#8217;s no audit trail. Logs can be tampered with. If the agent starts acting weird, nothing catches it.</p><p><strong>Layer 6: Security &amp; Compliance:</strong> Your DM policy is misconfigured. Anyone can message the agent. Pairing codes can be brute-forced. Identity can be spoofed across channels.</p><p><strong>Layer 7: Agent Ecosystem:</strong> A malicious plugin gets installed. A legitimate plugin&#8217;s npm dependency gets compromised. The skill registry serves poisoned packages.</p><p>The critical attack chain MAESTRO identifies: compromise the gateway (Layer 4) &#8594; access the session store (Layer 2) &#8594; poison conversation history (Layer 1) &#8594; control the agent (Layer 3) &#8594; spread via messaging (Layer 7).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JxSB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" width="1184" height="93" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:93,&quot;width&quot;:1184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20177,&quot;alt&quot;:&quot;Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)&quot;,&quot;title&quot;:&quot;Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" title="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Reading this was humbling. I&#8217;d addressed some of these by instinct during setup. Loopback binding, directory permissions, and pairing-based access control were all implemented. But &#8220;some&#8221; isn&#8217;t a security posture.</p><h2>SecureClaw: The Audit</h2><p><a href="https://github.com/adversa-ai/secureclaw">SecureClaw</a> is an open-source security tool built specifically for OpenClaw by Adversa AI. It maps to MAESTRO, OWASP, MITRE ATLAS, and NIST AI 100-2. The install is a git clone and a bash script, no npm install, no network calls, and no surprises.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;31b4db6d-5d95-413a-9748-1edf870fb6f3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">git clone https://github.com/adversa-ai/secureclaw.git
bash secureclaw/secureclaw/skill/scripts/install.sh</code></pre></div><p>Then you run the audit:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d3bd4bd2-f1f5-4139-a873-12e14991b95d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">bash ~/.openclaw/skills/secureclaw/scripts/quick-audit.sh</code></pre></div><p>My baseline score: <strong>57 out of 100.</strong> Zero criticals. Three HIGHs. Three MEDIUMs. Eight checks passing.</p><p>Here&#8217;s what passed without any work:</p><p>&#8226; Gateway bound to loopback (127.0.0.1) not exposed to network</p><p>&#8226; Gateway authentication present</p><p>&#8226; Directory permissions set to 700 (owner only)</p><p>&#8226; No browser relay exposed</p><p>&#8226; DM policy set to pairing (not open)</p><p>&#8226; Skills clean of malicious patterns</p><p>And here&#8217;s what failed:</p><blockquote><p>&#128992; HIGH Plaintext key exposure: Keys in openclaw.json and 5 backup files</p><p>&#128992; HIGH Sandbox mode: commands run directly on host</p><p>&#128992; HIGH Exec approval mode: agent acts without human approval</p><p>&#128993; MED No cognitive file baselines: can&#8217;t detect tampering</p><p>&#128993; MED Default control tokens: vulnerable to spoofing</p><p>&#128993; MED No failure mode: no graceful degradation</p></blockquote><h2>The Hardening</h2><p><strong>Step 1: Clean up credential leaks.</strong> OpenClaw creates .bak files every time you change config. Each backup contains your full config, including Slack tokens and API keys. I had five of them sitting in the OpenClaw directory. Deleted them all. Set the main config to 600 permissions.</p><p>This is the kind of thing that&#8217;s easy to miss and catastrophic to ignore. A single ls -la ~/.openclaw/ would show them. But who runs ls -la on their config directory after every change?</p><p><strong>Step 2: Create integrity baselines.</strong> SecureClaw&#8217;s hardener generates SHA256 hashes of your &#8220;cognitive files&#8221; IDENTITY.md, AGENTS.md, and HEARTBEAT.md. These are the files that define who your agent <em>is</em> and what it <em>does</em>. If an attacker or a hallucinating agent modifies them, the nightly integrity check will catch it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;90df69e7-009c-4398-90a0-846a125ebc72&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">bash ~/.openclaw/skills/secureclaw/scripts/quick-harden.sh</code></pre></div><p><strong>Step 3: Exec approvals.</strong> This is the big one. MAESTRO recommends human-in-the-loop approval for all shell commands. But my agent runs morning briefings and heartbeat checks on cron&#8212;unattended. Setting approvals to &#8220;always&#8221; would break all automation.</p><p>The solution: an <strong>allowlist with on-miss approval.</strong> I created ~/.openclaw/exec-approvals.json with 17 safe command patterns: imsg, calctl, apple-reminders, cairn, and basic file operations. Tars can run these freely. Anything else; curl, rm, pip install, or any command not on the list, requires human approval.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;0fd30378-dc9d-4356-9c40-b93415434cda&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  &#8220;defaults&#8221;: {
    &#8220;security&#8221;: &#8220;allowlist&#8221;,
    &#8220;ask&#8221;: &#8220;on-miss&#8221;
  },
  &#8220;agents&#8221;: {
    &#8220;main&#8221;: {
      &#8220;allowlist&#8221;: [
        { &#8220;pattern&#8221;: &#8220;imsg *&#8221;, &#8220;note&#8221;: &#8220;iMessage send/read&#8221; },
        { &#8220;pattern&#8221;: &#8220;calctl *&#8221;, &#8220;note&#8221;: &#8220;Apple Calendar&#8221; },
        { &#8220;pattern&#8221;: &#8220;cairn *&#8221;, &#8220;note&#8221;: &#8220;Task management&#8221; }
      ]
    }
  }
}</code></pre></div><p>This is the trade-off MAESTRO doesn&#8217;t talk about: <strong>security versus automation.</strong> Maximum security means every action needs approval. Maximum automation means the agent acts freely. The allowlist is the middle ground. Routine operations are pre-approved, and novel or dangerous operations require a human.</p><p><strong>Step 4: Full plugin install.</strong> Beyond the bash scripts, SecureClaw has a full npm plugin with 56 runtime audit checks, background monitors for config drift, and real-time integrity verification. Installing it required building from source (TypeScript &#8594; JavaScript) and registering it with OpenClaw&#8217;s plugin system.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;48d37b56-430a-4c07-b786-d9162bba10f5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">openclaw plugins install -l /path/to/secureclaw

openclaw config set plugins.allow &#8216;[&#8221;secureclaw&#8221;]&#8217;</code></pre></div><p>That plugins.allow line is important. By default, OpenClaw will auto-load any discovered plugin. Explicit trust means only plugins you&#8217;ve approved get loaded.</p><p><strong>Step 5: Nightly audit cron.</strong> A macOS LaunchAgent runs the full audit suite every night at 2 AM which includes quick-audit, integrity check, and supply chain scan. Results go to secureclaw-audit.log. If something changes overnight, it shows up in the morning.</p><h2>The Final Score</h2><p>After hardening: <strong>64 out of 100.</strong> Nine checks passing. Zero criticals. The three remaining HIGHs are documented, accepted trade-offs:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c-Ja!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" width="1456" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94290,&quot;alt&quot;:&quot;Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to \&quot;always\&quot; &#8212; I use an allowlist plus on-miss approval instead, because full \&quot;always\&quot; would break unattended cron automation.&quot;,&quot;title&quot;:&quot;Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to \&quot;always\&quot; &#8212; I use an allowlist plus on-miss approval instead, because full \&quot;always\&quot; would break unattended cron automation.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." title="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Findings I accepted (with reasoning)&#8212;Sandbox mode (Docker sandboxing would break imsg, calctl, and Apple Reminders); Plaintext keys in config (inherent to the platform config format, file is locked to 600); Exec approval not &#8220;always&#8221; (using allowlist + on-miss; full &#8220;always&#8221; breaks unattended cron automation).</em></p><p>The two MEDIUMs, control token customization and failure mode configuration, aren&#8217;t supported in OpenClaw v2026.3.2&#8217;s config schema yet. SecureClaw checks for them proactively. They&#8217;ll be fixable when OpenClaw adds the config options.</p><h2>What I Actually Learned</h2><p><strong>Security isn&#8217;t a feature you enable.</strong> It&#8217;s a series of trade-offs you make with your eyes open. Sandbox mode is &#8220;more secure&#8221; but breaks the tools that make the agent useful. Approval mode &#8220;always&#8221; is &#8220;more secure&#8221; but kills the automation that makes the agent worthwhile. The right security posture isn&#8217;t maximum restriction; it&#8217;s documented, intentional decisions about what risks you accept and why.</p><p><strong>Automated scanning is essential but insufficient.</strong> SecureClaw&#8217;s audit caught things I would have missed, including the .bak files with credentials, the missing integrity baselines, and the open exec policy. But the HIGHs it flagged as failures are things I&#8217;ve consciously accepted. No scanner can evaluate your specific trade-offs.</p><p><strong>The biggest threat isn&#8217;t external.</strong> In my setup (loopback-bound, pairing-gated, allowlist-filtered), the most likely security failure isn&#8217;t a network attacker. It&#8217;s a malicious skill, a compromised npm package, or the agent itself hallucinating destructive actions. Layer 7 (ecosystem) and Layer 1 (model behavior) are the real attack surfaces for a local-first setup. The exec approval allowlist is my primary defense for both.</p><p><strong>Clean up after yourself.</strong> OpenClaw creates backup files containing credentials on every config change. There&#8217;s no auto-cleanup. If you&#8217;re running OpenClaw, go check your directory right now: ls ~/.openclaw/*.bak*. You might be surprised.</p><h2>Quick Reference</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Qje!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" width="1456" height="1302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1302,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159198,&quot;alt&quot;:&quot;Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/.&quot;,&quot;title&quot;:&quot;Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." title="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Hardening actions and commands: install, run audit, apply hardening, check integrity, scan skills, check for credential leaks, set exec approvals, set plugin trust. Commands target ~/.openclaw/skills/secureclaw/scripts/. Full command details in the image.</em></p><h2>Update&#8212;June 2026: What I Actually Did When I Moved to ClaudeClaw</h2><p>I wrote this piece in March, when OpenClaw was still the thing running my Mac Studio. By the end of April, I&#8217;d shut it down. Disabled the cron jobs, quarantined the LaunchAgents, and rebuilt the whole stack on the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a>. Based off of <strong><a href="https://github.com/earlyaidopters/claudeclaw">ClaudeClaw</a> </strong>from the <a href="https://www.skool.com/earlyaidopters/about">Early AI-Dopters</a> AI learning group. The full post-mortem on <em>why</em>:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2dc54966-9faa-4b6c-9c7e-3b1f18782638&quot;,&quot;caption&quot;:&quot;Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Killed OpenClaw and Built ClaudeClaw Mission Control&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T5FD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6a6400-f0cd-4ff3-8541-f6cccf4d9a87_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-02T23:01:21.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196179846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Why? </strong>The short version is this: I couldn&#8217;t <em>see</em> into OpenClaw. Which, if you scroll back up, is Layer 5: Evaluation &amp; Observability, the exact layer this audit was weakest on.</p><p>You may wonder whether I just copied the 7-layer hardening over to the new stack. I didn&#8217;t, and I want to be honest about that. <strong>I did not port MAESTRO one-for-one.</strong> SecureClaw was written specifically for OpenClaw. Some of its thinking transferred; some of it didn&#8217;t. And the threat model itself moved on (more on that at the end). What the seven layers became was a checklist: for each one, <em>how does the new architecture answer this?</em> Here&#8217;s the scorecard.</p><p><strong>The two layers that changed the most.</strong></p><p><em><strong>Layer 5 (Observability)</strong></em> went from my single biggest weakness to the entire reason ClaudeClaw exists. There&#8217;s now a dedicated agent, <strong>WATCHMAN</strong>, running seven probes every hour: failed tasks, stuck tasks, missed scheduler slots, daemon liveness, content-pipeline health, hidden failures (it greps the success logs for crash text), and delegation crashes. More importantly, there&#8217;s a <em>second</em> healthcheck running as a separate LaunchAgent with its own keychain-backed alert token. If the main daemon dies, the thing that tells me about it is still alive. The rule I wrote for myself out of this: <strong>the watcher cannot share fate with the watched</strong>. There&#8217;s also a behavioral dashboard, DefenseClaw, sitting on 127.0.0.1:3141.</p><p><em><strong>Layer 3 (Agent Frameworks)</strong></em> is where my OpenClaw work actually carried forward. The exec-approvals allowlist from Step 3 above is the direct ancestor of what ClaudeClaw does now, except the enforcement dropped down a level. The first thing I shipped was killing bypassPermissions (the main agent had been running with permission checks disabled, which means a compromised agent has unlimited tool access. The SDK was no ceiling at all), switching to the SDK&#8217;s default permission mode, and handing the main agent a 15-tool allowlist as the single source of truth. Same idea as the OpenClaw allowlist. Enforced by the SDK itself instead of a config file I had to maintain.</p><p>The rest mapped like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQoC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" width="1456" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:339803,&quot;alt&quot;:&quot;Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries.&quot;,&quot;title&quot;:&quot;Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." title="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>How each of the seven MAESTRO layers from the OpenClaw audit is answered in ClaudeClaw. </em></p><p><em><strong>Layer 1 Foundation Models: </strong>channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved).</em></p><p><em><strong>Layer 2 Data Operations: </strong>Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced and extended).</em></p><p><em><strong>Layer 3 Agent Frameworks:</strong> SDK permission ceiling and a 15-tool allowlist, the direct successor to the OpenClaw exec-approvals list (kept, moved into the SDK).</em></p><p><em><strong>Layer 4 Deployment and Infrastructure: </strong>an egress gateway plus kernel-level pf default-deny (replaced). </em></p><p><em><strong>Layer 5 Evaluation and Observability: </strong>WATCHMAN&#8217;s seven probes and a fate-isolated external healthcheck, the biggest upgrade.</em></p><p><em><strong>Layer 6 Security and Compliance:</strong> out-of-band Telegram confirmation for state-changing actions and a role policy kept separate from content memory (evolved).</em></p><p><em><strong>Layer 7 Agent Ecosystem: </strong>an MCP allowlist plus the tool ceiling as a second layer (kept and hardened). </em></p><p><em>Plus a new row beyond MAESTRO.  <strong>Memory persistence: </strong>TTLs, a hash-chained write log, and canaries.</em></p><p><strong>Where the 7-layer model ran out.</strong></p><p>MAESTRO is a <em>static</em> threat model. It&#8217;s a map of what can go wrong at each layer, frozen in time. What it doesn&#8217;t have a layer for is <strong>persistence</strong>. An attack that lands quietly in your agent&#8217;s memory or vector store and just waits. My scheduler re-enters context every 60 seconds, which means anything dormant in memory fires on a clock. That&#8217;s a different class of problem, and it has a name now: <a href="https://www.semanticscholar.org/paper/Logic-layer-Prompt-Control-Injection-(LPCI)%3A-A-in-Atta-Huang/7209db0a616b54335db85d6e73a0dc9505192e59?utm_source=direct_link">LPCI, Logic-layer Prompt-based Conditional Injection</a>. Hardening against it (I am planning a separate two-part write-up on <a href="https://astgl.substack.com">As The Geek Learns</a>) meant building things MAESTRO never asked for, including a canonicalizer that decodes payloads <em>before</em> they reach the vector store, channel-tagged prompts so the model knows retrieved text is data and not instructions, memory TTLs, a hash-chained write log, and canary entries that page me if memory ever leaks into output.</p><p><strong>What I gave up and what I kept.</strong> The honest cost of the move: I lost local-first. OpenClaw ran on Ollama, fully offline; ClaudeClaw talks to Anthropic&#8217;s API. I still own every byte of my data; it&#8217;s all on my SSD; I just don&#8217;t own the weights anymore. What carried over intact was the philosophy this whole series is built on: every document is a file I can grep, every config is version-controlled, and every decision has a session note. That part never changed.</p><p><em>This is Part 5 of the Notion Replacement series. We went from &#8220;install an AI agent&#8221; to &#8220;secure it against a 7-layer threat model&#8221; in two days. Follow along at <a href="https://astgl.substack.com">As The Geek Learns</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[I Secured My AI Agent With a 7-Layer Threat Model]]></title><description><![CDATA[Using the MAESTRO framework to harden an autonomous agent&#8212;seven layers of things that can go wrong, translated from security-paper-speak into your day.]]></description><link>https://astgl.com/p/secured-ai-agent-7-layer-threat-model</link><guid isPermaLink="false">https://astgl.com/p/secured-ai-agent-7-layer-threat-model</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 08 Jun 2026 16:31:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IkGX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IkGX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IkGX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 424w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 848w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1272w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IkGX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png" width="1456" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99253,&quot;alt&quot;:&quot;Title card for \&quot;I Secured My AI Agent With a 7-Layer Threat Model.\&quot; A dark navy banner: on the left, a teal security shield holding a padlock with an audit score rising from 57 to 64 out of 100; on the right, the seven MAESTRO threat layers stacked as color-coded bars, from Layer 7 (Agent Ecosystem) at the top down to Layer 1 (Foundation Models).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Title card for &quot;I Secured My AI Agent With a 7-Layer Threat Model.&quot; A dark navy banner: on the left, a teal security shield holding a padlock with an audit score rising from 57 to 64 out of 100; on the right, the seven MAESTRO threat layers stacked as color-coded bars, from Layer 7 (Agent Ecosystem) at the top down to Layer 1 (Foundation Models)." title="Title card for &quot;I Secured My AI Agent With a 7-Layer Threat Model.&quot; A dark navy banner: on the left, a teal security shield holding a padlock with an audit score rising from 57 to 64 out of 100; on the right, the seven MAESTRO threat layers stacked as color-coded bars, from Layer 7 (Agent Ecosystem) at the top down to Layer 1 (Foundation Models)." srcset="https://substackcdn.com/image/fetch/$s_!IkGX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 424w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 848w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1272w, https://substackcdn.com/image/fetch/$s_!IkGX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19219c0-6e42-4e2b-bd3f-83158bba97eb_1456x816.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>I have an autonomous AI agent running on my Mac Studio. It has full shell access, reads my calendar, manages my tasks, and sends iMessages on my behalf. It runs 24/7 as a background service.</strong></p><p>If that sentence doesn&#8217;t make you slightly nervous, you haven&#8217;t been paying attention. In <a href="https://www.isec.news/2026/02/10/securityscorecard-135000-plus-internet-exposed-openclaw-instances-found/">February 2026, researchers found over 135,000 OpenClaw instances exposed to the public internet</a>. A coordinated attack called <a href="https://cybersecuritynews.com/clawhavoc-poisoned-openclaws-clawhub/">ClawHavoc</a> planted over a thousand malicious plugins in the community registry. Nine CVEs have been disclosed, including remote code execution.</p><p>I needed to take security seriously. Not &#8220;I changed the default password&#8221; seriously. Threat-model seriously.</p><h2>MAESTRO: Seven Layers of Things That Can Go Wrong</h2><p>The <a href="https://cloudsecurityalliance.org/">Cloud Security Alliance </a>published a framework called <a href="https://github.com/CloudSecurityAlliance/MAESTRO">MAESTRO</a>&#8212;a 7-layer threat model specifically designed for agentic AI systems. Ken Huang mapped it directly to OpenClaw&#8217;s codebase, identifying 35+ specific threats across every layer of the stack.</p><p>Here are the seven layers, translated from security-paper language into &#8220;things that could actually ruin your day&#8221;:</p><p><strong>Layer 1: Foundation Models:</strong> Someone sends your agent a crafted message that hijacks its behavior. Prompt injection. Jailbreaks. System prompt leakage. Your agent does what an attacker tells it to instead of what you told it to.</p><p><strong>Layer 2: Data Operations:</strong> Your credentials are stored in plaintext JSON files. Your session logs contain every conversation forever. A malicious skill injects code through your workspace.</p><p><strong>Layer 3: Agent Frameworks:</strong> The agent misuses its own tools. It runs shell commands it shouldn&#8217;t. It spawns sessions without authorization. It escalates its own privileges.</p><p><strong>Layer 4: Deployment &amp; Infrastructure:</strong> Your gateway is exposed to the network. Someone brute-forces the WebSocket token. A reverse proxy misconfiguration bypasses authentication entirely.</p><p><strong>Layer 5: Evaluation &amp; Observability:</strong> Nobody&#8217;s watching the agent for anomalous behavior. There&#8217;s no audit trail. Logs can be tampered with. If the agent starts acting weird, nothing catches it.</p><p><strong>Layer 6: Security &amp; Compliance:</strong> Your DM policy is misconfigured. Anyone can message the agent. Pairing codes can be brute-forced. Identity can be spoofed across channels.</p><p><strong>Layer 7: Agent Ecosystem:</strong> A malicious plugin gets installed. A legitimate plugin&#8217;s npm dependency gets compromised. The skill registry serves poisoned packages.</p><p>The critical attack chain MAESTRO identifies: compromise the gateway (Layer 4) &#8594; access the session store (Layer 2) &#8594; poison conversation history (Layer 1) &#8594; control the agent (Layer 3) &#8594; spread via messaging (Layer 7).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JxSB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png" width="1184" height="93" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:93,&quot;width&quot;:1184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20177,&quot;alt&quot;:&quot;Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" title="Critical Attack Chain identified by MAESTRO. Flowchart from left to right: Loopback Binding Blocks Step1 - defense - compromise the gateway (Layer 4) then to access the session store (Layer 2) then to poison conversation history (Layer 1) then to control the agent (Layer 3) then to spread via messaging (Layer 7)" srcset="https://substackcdn.com/image/fetch/$s_!JxSB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 424w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 848w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1272w, https://substackcdn.com/image/fetch/$s_!JxSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b3575cc-9b2d-4091-93e4-cc052d508b28_1184x93.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Reading this was humbling. I&#8217;d addressed some of these by instinct during setup. Loopback binding, directory permissions, and pairing-based access control were all implemented. But &#8220;some&#8221; isn&#8217;t a security posture.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>SecureClaw: The Audit</h2><p><a href="https://github.com/adversa-ai/secureclaw">SecureClaw</a> is an open-source security tool built specifically for OpenClaw by Adversa AI. It maps to MAESTRO, OWASP, MITRE ATLAS, and NIST AI 100-2. The install is a git clone and a bash script, no npm install, no network calls, and no surprises.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4d8e2ceb-1192-4b9e-8326-752bb92548ea&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">git clone https://github.com/adversa-ai/secureclaw.git
bash secureclaw/secureclaw/skill/scripts/install.sh</code></pre></div><p>Then you run the audit:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5c44792e-87fe-4898-99ca-c79570cea425&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">bash ~/.openclaw/skills/secureclaw/scripts/quick-audit.sh</code></pre></div><p>My baseline score: <strong>57 out of 100.</strong> Zero criticals. Three HIGHs. Three MEDIUMs. Eight checks passing.</p><p>Here&#8217;s what passed without any work:</p><p>&#8226; Gateway bound to loopback (127.0.0.1) not exposed to network</p><p>&#8226; Gateway authentication present</p><p>&#8226; Directory permissions set to 700 (owner only)</p><p>&#8226; No browser relay exposed</p><p>&#8226; DM policy set to pairing (not open)</p><p>&#8226; Skills clean of malicious patterns</p><p>And here&#8217;s what failed:</p><blockquote><p>&#128992; HIGH Plaintext key exposure: Keys in openclaw.json and 5 backup files</p><p>&#128992; HIGH Sandbox mode: commands run directly on host</p><p>&#128992; HIGH Exec approval mode: agent acts without human approval</p><p>&#128993; MED No cognitive file baselines: can&#8217;t detect tampering</p><p>&#128993; MED Default control tokens: vulnerable to spoofing</p><p>&#128993; MED No failure mode: no graceful degradation</p></blockquote><h2>The Hardening</h2><p><strong>Step 1: Clean up credential leaks.</strong> OpenClaw creates .bak files every time you change config. Each backup contains your full config, including Slack tokens and API keys. I had five of them sitting in the OpenClaw directory. Deleted them all. Set the main config to 600 permissions.</p><p>This is the kind of thing that&#8217;s easy to miss and catastrophic to ignore. A single ls -la ~/.openclaw/ would show them. But who runs ls -la on their config directory after every change?</p><p><strong>Step 2: Create integrity baselines.</strong> SecureClaw&#8217;s hardener generates SHA256 hashes of your &#8220;cognitive files&#8221; IDENTITY.md, AGENTS.md, and HEARTBEAT.md. These are the files that define who your agent <em>is</em> and what it <em>does</em>. If an attacker or a hallucinating agent modifies them, the nightly integrity check will catch it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;847539b7-2bbe-458a-9fb0-4549a0891a45&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">bash ~/.openclaw/skills/secureclaw/scripts/quick-harden.sh</code></pre></div><p><strong>Step 3: Exec approvals.</strong> This is the big one. MAESTRO recommends human-in-the-loop approval for all shell commands. But my agent runs morning briefings and heartbeat checks on cron&#8212;unattended. Setting approvals to &#8220;always&#8221; would break all automation.</p><p>The solution: an <strong>allowlist with on-miss approval.</strong> I created ~/.openclaw/exec-approvals.json with 17 safe command patterns: imsg, calctl, apple-reminders, cairn, and basic file operations. Tars can run these freely. Anything else; curl, rm, pip install, or any command not on the list, requires human approval.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;1bf97214-1271-4610-9e32-6f2d5cf85833&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  "defaults": {
    "security": "allowlist",
    "ask": "on-miss"
  },
  "agents": {
    "main": {
      "allowlist": [
        { "pattern": "imsg *", "note": "iMessage send/read" },
        { "pattern": "calctl *", "note": "Apple Calendar" },
        { "pattern": "cairn *", "note": "Task management" }
      ]
    }
  }
}</code></pre></div><p>This is the trade-off MAESTRO doesn&#8217;t talk about: <strong>security versus automation.</strong> Maximum security means every action needs approval. Maximum automation means the agent acts freely. The allowlist is the middle ground. Routine operations are pre-approved, and novel or dangerous operations require a human.</p><p><strong>Step 4: Full plugin install.</strong> Beyond the bash scripts, SecureClaw has a full npm plugin with 56 runtime audit checks, background monitors for config drift, and real-time integrity verification. Installing it required building from source (TypeScript &#8594; JavaScript) and registering it with OpenClaw&#8217;s plugin system.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;1dc00374-b934-4485-a78e-91ca04004717&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">openclaw plugins install -l /path/to/secureclaw

openclaw config set plugins.allow &#8216;[&#8221;secureclaw&#8221;]&#8217;</code></pre></div><p>That plugins.allow line is important. By default, OpenClaw will auto-load any discovered plugin. Explicit trust means only plugins you&#8217;ve approved get loaded.</p><p><strong>Step 5: Nightly audit cron.</strong> A macOS LaunchAgent runs the full audit suite every night at 2 AM which includes quick-audit, integrity check, and supply chain scan. Results go to secureclaw-audit.log. If something changes overnight, it shows up in the morning.</p><h2>The Final Score</h2><p>After hardening: <strong>64 out of 100.</strong> Nine checks passing. Zero criticals. The three remaining HIGHs are documented, accepted trade-offs:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c-Ja!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png" width="1456" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94290,&quot;alt&quot;:&quot;Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to \&quot;always\&quot; &#8212; I use an allowlist plus on-miss approval instead, because full \&quot;always\&quot; would break unattended cron automation.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." title="Table of the three high-severity findings I accepted after hardening, each with the reasoning. One: sandbox mode left off, because Docker sandboxing would break imsg, calctl, and Apple Reminders. Two: plaintext keys in the config accepted, because they're inherent to the platform's config format and the file is locked to 600 permissions. Three: exec approval not set to &quot;always&quot; &#8212; I use an allowlist plus on-miss approval instead, because full &quot;always&quot; would break unattended cron automation." srcset="https://substackcdn.com/image/fetch/$s_!c-Ja!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 424w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 848w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Ja!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c0bfc3-a524-4926-98b0-be4ada2678d2_1800x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Findings I accepted (with reasoning)&#8212;Sandbox mode (Docker sandboxing would break imsg, calctl, and Apple Reminders); Plaintext keys in config (inherent to the platform config format, file is locked to 600); Exec approval not &#8220;always&#8221; (using allowlist + on-miss; full &#8220;always&#8221; breaks unattended cron automation).</em></p><p>The two MEDIUMs, control token customization and failure mode configuration, aren&#8217;t supported in OpenClaw v2026.3.2&#8217;s config schema yet. SecureClaw checks for them proactively. They&#8217;ll be fixable when OpenClaw adds the config options.</p><h2>What I Actually Learned</h2><p><strong>Security isn&#8217;t a feature you enable.</strong> It&#8217;s a series of trade-offs you make with your eyes open. Sandbox mode is &#8220;more secure&#8221; but breaks the tools that make the agent useful. Approval mode &#8220;always&#8221; is &#8220;more secure&#8221; but kills the automation that makes the agent worthwhile. The right security posture isn&#8217;t maximum restriction; it&#8217;s documented, intentional decisions about what risks you accept and why.</p><p><strong>Automated scanning is essential but insufficient.</strong> SecureClaw&#8217;s audit caught things I would have missed, including the .bak files with credentials, the missing integrity baselines, and the open exec policy. But the HIGHs it flagged as failures are things I&#8217;ve consciously accepted. No scanner can evaluate your specific trade-offs.</p><p><strong>The biggest threat isn&#8217;t external.</strong> In my setup (loopback-bound, pairing-gated, allowlist-filtered), the most likely security failure isn&#8217;t a network attacker. It&#8217;s a malicious skill, a compromised npm package, or the agent itself hallucinating destructive actions. Layer 7 (ecosystem) and Layer 1 (model behavior) are the real attack surfaces for a local-first setup. The exec approval allowlist is my primary defense for both.</p><p><strong>Clean up after yourself.</strong> OpenClaw creates backup files containing credentials on every config change. There&#8217;s no auto-cleanup. If you&#8217;re running OpenClaw, go check your directory right now: ls ~/.openclaw/*.bak*. You might be surprised.</p><h2>Quick Reference</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Qje!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png" width="1456" height="1302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1302,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159198,&quot;alt&quot;:&quot;Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." title="Quick-reference table of SecureClaw hardening commands and what each does: install the tool, run the audit, apply hardening, check integrity baselines, scan skills, check for credential-leaking backup files, set exec approvals, and set plugin trust. All commands target ~/.openclaw/skills/secureclaw/scripts/." srcset="https://substackcdn.com/image/fetch/$s_!4Qje!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 424w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 848w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1272w, https://substackcdn.com/image/fetch/$s_!4Qje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6afaeb0-54c3-4bb5-aee2-8d0865f7d501_1800x1609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Hardening actions and commands: install, run audit, apply hardening, check integrity, scan skills, check for credential leaks, set exec approvals, set plugin trust. Commands target ~/.openclaw/skills/secureclaw/scripts/. Full command details in the image.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/secured-ai-agent-7-layer-threat-model?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/secured-ai-agent-7-layer-threat-model?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Update&#8212;June 2026: What I Actually Did When I Moved to ClaudeClaw</h2><p>I wrote this piece in March, when OpenClaw was still the thing running my Mac Studio. By the end of April, I&#8217;d shut it down. Disabled the cron jobs, quarantined the LaunchAgents, and rebuilt the whole stack on the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a>. Based off of <strong><a href="https://github.com/earlyaidopters/claudeclaw">ClaudeClaw</a> </strong>from the <a href="https://www.skool.com/earlyaidopters/about">Early AI-Dopters</a> AI learning group. The full post-mortem on <em>why</em>:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d89c9446-5e84-44f9-b29b-d62ecb13eb61&quot;,&quot;caption&quot;:&quot;Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Killed OpenClaw and Built ClaudeClaw Mission Control&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T5FD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6a6400-f0cd-4ff3-8541-f6cccf4d9a87_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-02T23:01:21.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196179846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Why? </strong>The short version is this: I couldn&#8217;t <em>see</em> into OpenClaw. Which, if you scroll back up, is Layer 5: Evaluation &amp; Observability, the exact layer this audit was weakest on.</p><p>You may wonder whether I just copied the 7-layer hardening over to the new stack. I didn&#8217;t, and I want to be honest about that. <strong>I did not port MAESTRO one-for-one.</strong> SecureClaw was written specifically for OpenClaw. Some of its thinking transferred; some of it didn&#8217;t. And the threat model itself moved on (more on that at the end). What the seven layers became was a checklist: for each one, <em>how does the new architecture answer this?</em> Here&#8217;s the scorecard.</p><p><strong>The two layers that changed the most.</strong></p><p><em><strong>Layer 5 (Observability)</strong></em> went from my single biggest weakness to the entire reason ClaudeClaw exists. There&#8217;s now a dedicated agent, <strong>WATCHMAN</strong>, running seven probes every hour: failed tasks, stuck tasks, missed scheduler slots, daemon liveness, content-pipeline health, hidden failures (it greps the success logs for crash text), and delegation crashes. More importantly, there&#8217;s a <em>second</em> healthcheck running as a separate LaunchAgent with its own keychain-backed alert token. If the main daemon dies, the thing that tells me about it is still alive. The rule I wrote for myself out of this: <strong>the watcher cannot share fate with the watched</strong>. There&#8217;s also a behavioral dashboard, DefenseClaw, sitting on 127.0.0.1:3141.</p><p><em><strong>Layer 3 (Agent Frameworks)</strong></em> is where my OpenClaw work actually carried forward. The exec-approvals allowlist from Step 3 above is the direct ancestor of what ClaudeClaw does now, except the enforcement dropped down a level. The first thing I shipped was killing bypassPermissions (the main agent had been running with permission checks disabled, which means a compromised agent has unlimited tool access. The SDK was no ceiling at all), switching to the SDK&#8217;s default permission mode, and handing the main agent a 15-tool allowlist as the single source of truth. Same idea as the OpenClaw allowlist. Enforced by the SDK itself instead of a config file I had to maintain.</p><p>The rest mapped like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQoC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png" width="1456" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:339803,&quot;alt&quot;:&quot;Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/201130607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." title="Table mapping each of the seven MAESTRO threat layers to how ClaudeClaw answers it, with a verdict per layer. Layer 1 Foundation Models: channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved). Layer 2 Data Operations: Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced). Layer 3 Agent Frameworks: an SDK permission ceiling and a 15-tool allowlist, the direct heir to the OpenClaw exec-approvals list (kept). Layer 4 Deployment &amp; Infrastructure: an egress gateway plus kernel-level pf default-deny (replaced). Layer 5 Evaluation &amp; Observability: WATCHMAN's seven probes and a fate-isolated external healthcheck &#8212; the biggest upgrade. Layer 6 Security &amp; Compliance: out-of-band Telegram confirmations for state-changing actions and a role policy kept separate from content memory (evolved). Layer 7 Agent Ecosystem: an MCP allowlist plus the tool ceiling as a second layer (hardened). Plus a new row beyond MAESTRO &#8212; memory persistence: TTLs, a hash-chained write log, and canaries." srcset="https://substackcdn.com/image/fetch/$s_!rQoC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 424w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 848w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!rQoC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47811f31-f7fb-4c0c-b564-593817635e77_2500x1300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>How each of the seven MAESTRO layers from the OpenClaw audit is answered in ClaudeClaw. </em></p><p><em><strong>Layer 1 Foundation Models: </strong>channel tagging and a trust gradient that treats retrieved text as data, not directives (evolved).</em></p><p><em><strong>Layer 2 Data Operations: </strong>Chamberlain outbound scanner, exfiltration-guard, queryable Memory v2, and ingest-time canonicalization (replaced and extended).</em></p><p><em><strong>Layer 3 Agent Frameworks:</strong> SDK permission ceiling and a 15-tool allowlist, the direct successor to the OpenClaw exec-approvals list (kept, moved into the SDK).</em></p><p><em><strong>Layer 4 Deployment and Infrastructure: </strong>an egress gateway plus kernel-level pf default-deny (replaced). </em></p><p><em><strong>Layer 5 Evaluation and Observability: </strong>WATCHMAN&#8217;s seven probes and a fate-isolated external healthcheck, the biggest upgrade.</em></p><p><em><strong>Layer 6 Security and Compliance:</strong> out-of-band Telegram confirmation for state-changing actions and a role policy kept separate from content memory (evolved).</em></p><p><em><strong>Layer 7 Agent Ecosystem: </strong>an MCP allowlist plus the tool ceiling as a second layer (kept and hardened). </em></p><p><em>Plus a new row beyond MAESTRO.  <strong>Memory persistence: </strong>TTLs, a hash-chained write log, and canaries.</em></p><p><strong>Where the 7-layer model ran out.</strong></p><p>MAESTRO is a <em>static</em> threat model. It&#8217;s a map of what can go wrong at each layer, frozen in time. What it doesn&#8217;t have a layer for is <strong>persistence</strong>. An attack that lands quietly in your agent&#8217;s memory or vector store and just waits. My scheduler re-enters context every 60 seconds, which means anything dormant in memory fires on a clock. That&#8217;s a different class of problem, and it has a name now: <a href="https://www.semanticscholar.org/paper/Logic-layer-Prompt-Control-Injection-(LPCI)%3A-A-in-Atta-Huang/7209db0a616b54335db85d6e73a0dc9505192e59?utm_source=direct_link">LPCI, Logic-layer Prompt-based Conditional Injection</a>. Hardening against it (I am planning a separate two-part write-up on <a href="https://astgl.substack.com">As The Geek Learns</a>) meant building things MAESTRO never asked for, including a canonicalizer that decodes payloads <em>before</em> they reach the vector store, channel-tagged prompts so the model knows retrieved text is data and not instructions, memory TTLs, a hash-chained write log, and canary entries that page me if memory ever leaks into output.</p><p><strong>What I gave up and what I kept.</strong> The honest cost of the move: I lost local-first. OpenClaw ran on Ollama, fully offline; ClaudeClaw talks to Anthropic&#8217;s API. I still own every byte of my data; it&#8217;s all on my SSD; I just don&#8217;t own the weights anymore. What carried over intact was the philosophy this whole series is built on: every document is a file I can grep, every config is version-controlled, and every decision has a session note. That part never changed.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p><em>This is Part 5 of the Notion Replacement series. We went from &#8220;install an AI agent&#8221; to &#8220;secure it against a 7-layer threat model&#8221; in two days. Follow along at <a href="https://astgl.substack.com">As The Geek Learns</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/secured-ai-agent-7-layer-threat-model/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/secured-ai-agent-7-layer-threat-model/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[5 Questions to Ask Before You Build the AI Project Your CEO Just Pitched]]></title><description><![CDATA[A one-page checklist that turns a vague AI proposal into a decision you can defend in writing.]]></description><link>https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched</link><guid isPermaLink="false">https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Thu, 04 Jun 2026 11:02:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ure4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ure4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ure4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ure4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ure4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b776eec5-058d-4572-b817-5335ae67c625_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37713,&quot;alt&quot;:&quot;ASTGL branded hero image on a dark navy background. Top-left label reads \&quot;ASTGL &#183; DIGITAL TOOLS SERIES\&quot; in orange. The main title spans two lines in large white type: \&quot;5 Questions Before You Build\&quot; and \&quot;the AI Project Your CEO Pitched.\&quot; Below it, an orange subtitle reads \&quot;The 1-page Technical Reality Check.\&quot; In the bottom-right corner, a dark-blue rounded box with an orange border displays \&quot;5 QUESTIONS\&quot; in large orange type and \&quot;to ask first\&quot; in light gray beneath. Decorative horizontal scan-lines run down the left margin. The footer reads \&quot;asthegeeklearns.com\&quot; in gray.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284415?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ASTGL branded hero image on a dark navy background. Top-left label reads &quot;ASTGL &#183; DIGITAL TOOLS SERIES&quot; in orange. The main title spans two lines in large white type: &quot;5 Questions Before You Build&quot; and &quot;the AI Project Your CEO Pitched.&quot; Below it, an orange subtitle reads &quot;The 1-page Technical Reality Check.&quot; In the bottom-right corner, a dark-blue rounded box with an orange border displays &quot;5 QUESTIONS&quot; in large orange type and &quot;to ask first&quot; in light gray beneath. Decorative horizontal scan-lines run down the left margin. The footer reads &quot;asthegeeklearns.com&quot; in gray." title="ASTGL branded hero image on a dark navy background. Top-left label reads &quot;ASTGL &#183; DIGITAL TOOLS SERIES&quot; in orange. The main title spans two lines in large white type: &quot;5 Questions Before You Build&quot; and &quot;the AI Project Your CEO Pitched.&quot; Below it, an orange subtitle reads &quot;The 1-page Technical Reality Check.&quot; In the bottom-right corner, a dark-blue rounded box with an orange border displays &quot;5 QUESTIONS&quot; in large orange type and &quot;to ask first&quot; in light gray beneath. Decorative horizontal scan-lines run down the left margin. The footer reads &quot;asthegeeklearns.com&quot; in gray." srcset="https://substackcdn.com/image/fetch/$s_!ure4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ure4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ure4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb776eec5-058d-4572-b817-5335ae67c625_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>5 Questions to Ask Before You Build the AI Project Your CEO Just Pitched</h1><p>You know the email. It shows up Tuesday morning, forwarded with a few lines of enthusiasm and a ChatGPT-drafted proposal attached. "Saw this and thought of us. Can we do this?" The PDF has a logo, bullet points, and exactly zero integration requirements. It also has a six-week timeline and a budget that assumes nothing goes wrong.</p><p>You have somewhere between 24 and 72 hours before your CEO follows up asking what you think.</p><p>If you say yes, you're on the hook for a project you didn't scope. If you say no, you're the person who kills ideas. Neither answer is actually available to you. What you need is a third path: a structured evaluation that produces a defensible, professional response in the time it takes to drink your morning coffee.</p><p>That's what the Technical Reality Check is. Five questions. One page. Every answer points directly at a commitment your organization will have to honor if this project moves forward.</p><p>Here it is in full.</p><h2>The Technical Reality Check: 5 Questions That Surface What the Proposal Left Out</h2><h3>Question 1: What specific business outcome does this solve, and how will we measure success?</h3><p>AI tools generate confident-sounding proposals that describe solutions, not problems. A proposal for "an AI-powered IT ticketing system" describes a technology. It doesn't describe what's broken right now, how broken it is, or what "fixed" looks like in measurable terms.</p><p>Before any conversation about implementation, you need an answer to: what does success look like in six months, and how will we know we hit it? Ticket resolution time down 30%? First-contact resolution rate up 20%? Those are real answers. "Things will be more efficient" is not.</p><p>Unmeasurable projects never officially fail. Which means they never stop consuming resources. This question isn't about being difficult. It's about making sure the organization is buying an outcome, not a technology.</p><p><strong>The red flag:</strong> Any proposal where the only success metric is "we deployed it."</p><h3>Question 2: Who owns the ongoing maintenance, security patching, and vendor relationship?</h3><p>Vendor proposals describe launch day. They are almost entirely silent about year two.</p><p>Every new system creates a permanent maintenance obligation: patching, credential rotation, user access reviews, API deprecations, contract renewals, and a support relationship with a vendor whose incentives are not aligned with yours. If that obligation doesn't have a named owner before the project starts, IT inherits it by default. Forever. Without headcount.</p><p>This question forces the conversation about operational reality before anyone has signed a contract. The answer also tells you a lot about how seriously the proposal was thought through. If nobody has asked "who maintains this?", nobody has thought past the demo.</p><p><strong>The red flag:</strong> "The vendor handles everything." Vendors handle their system. You handle the integration, the credentials, the user provisioning, the data pipeline, and the 2 AM alert when something breaks between their system and yours.</p><h3>Question 3: What happens to our existing systems, data, and processes?</h3><p>New systems don't exist in a vacuum. They touch your directory, your ticketing system, your identity provider, your backup scope, your audit logs. Each of those integration points is a potential failure mode, a migration cost, or a compliance question.</p><p>AI-generated proposals routinely skip integration complexity. This isn't because the AI is being deceptive. It's because the AI generating the proposal doesn't know your stack. The proposal was written in a context-free environment. Your environment is anything but.</p><p>Before committing, you need to know: what does this touch, and what has to move or change for it to work? And who does that work? Data migration alone can turn a "simple" deployment into a multi-month project. Asking this question early is how you find out.</p><p><strong>The red flag:</strong> "It integrates easily with your existing tools." That's a sales phrase, not an engineering estimate. "Easy" is undefined until your systems engineer has looked at the API docs.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Question 4: What's the realistic timeline and resource cost, not the optimistic one?</h3><p>Vendor timelines assume clean data, available staff, smooth approvals, and nothing else on the backlog. Your timeline accounts for your actual team, their current commitments, the security review cycle, the change management process, and the three things nobody predicted.</p><p>The gap between those two numbers is usually where projects go sideways. Not because the technology failed, but because the plan never accounted for reality.</p><p>This question also surfaces a common pattern: the timeline was set before IT was consulted. Any timeline that precedes a technical assessment is a guess dressed up as a schedule. You're the one who'll be explaining the delay when the guess turns out to be wrong.</p><p><strong>The red flag:</strong> A go-live date in the proposal. That's not a plan, it's a target somebody made up. Ask who set it and what it was based on.</p><h3>Question 5: What's the exit strategy if this doesn't work as expected?</h3><p>Every vendor says their product works. You need a plan for when it doesn't. When the pricing doubles at renewal. When the company gets acquired and support degrades. When a compliance requirement changes and the product doesn't keep up.</p><p>Data portability, rollback procedures, and contractual exit terms are not pessimism. They're the difference between a manageable failure and a situation where you're paying for a system that doesn't work because migrating off it is too expensive to contemplate.</p><p>This question also signals organizational maturity. IT teams that ask exit questions before they sign contracts don't get held hostage. IT teams that don't ask end up managing a five-year sunset project for a tool they stopped believing in three years ago.</p><p><strong>The red flag:</strong> "We can always just stop using it." Can you migrate your data? In what format? At what cost? How long does it take? If nobody has asked those questions, stopping isn't as simple as it sounds.</p><h2>The Checklist in Practice: Walking Through a Real Scenario</h2><p>Here's what a Technical Reality Check pass looks like when you actually run it.</p><p>Your CEO forwards a ChatGPT-drafted proposal on a Monday morning. The subject line is "AI Agent for IT Ticket Triage." The proposal is two pages. It describes an AI system that reads incoming IT tickets, categorizes them by priority and type, routes them to the right team, and drafts first-response emails automatically. There's a mockup screenshot. There's a line about "easy integration with your existing ITSM." There's a timeline: six weeks to deployment.</p><p>You open the Technical Reality Check.</p><p><strong>Q1: What specific business outcome does this solve?</strong></p><p>The proposal says "reduce response times and improve IT efficiency." No baseline. No metric. You check your current ITSM data: average first response is 4.2 hours, your SLA target is 2 hours, you're meeting it 71% of the time. Now you have a problem worth solving. You write it down: "We need first-response SLA compliance above 85%. Current state: 71%." That's the outcome. If the AI system can't demonstrate a path to that specific number, the conversation is premature.</p><p><strong>Q2: Who owns maintenance and the vendor relationship?</strong></p><p>Nobody is named in the proposal. You have a team of four. One of them is already carrying the ITSM admin role. You note: this needs a named owner and a rough estimate of ongoing hours before it can go to planning. You also flag the API integration dependency: your ITSM has a rate-limited API that's caused problems before. Someone needs to read the vendor's API docs before "easy integration" gets treated as a fact.</p><p><strong>Q3: What happens to existing systems and data?</strong></p><p>Your ticketing data includes ticket histories, customer records, and some attachments. The proposal doesn't mention data handling. You note two questions: where does ticket data go once the AI processes it, and what are the data residency requirements given that you handle some HIPAA-adjacent systems? That second question alone could be a blocker. You don't know yet, but you know to ask.</p><p><strong>Q4: What's the realistic timeline and resource cost?</strong></p><p>Six weeks assumes nothing else is happening. Your team is currently in the middle of a server migration that runs through the end of the month. Realistically, this project can't start until mid-next-month, and your most experienced engineer (the one who'd need to own the integration) is at 90% utilization. You write down: "Realistic start: six weeks out. Realistic deployment: 12-16 weeks from proposal receipt. Not 6."</p><p><strong>Q5: What's the exit strategy?</strong></p><p>The proposal doesn't mention it. You note: before any contract, you need to know the data export format, the contract term length, and what happens to stored ticket data at offboarding.</p><p>That's it. You've just done a Technical Reality Check. Total time: 15 minutes.</p><p>Now you can write a response. Not "no." Not "yes." Something like: "I've done a preliminary review. Before we can assess feasibility, I need answers to five specific questions. Here they are. Happy to set up 30 minutes to walk through them together." You've moved the conversation from enthusiasm to decision-ready. You've protected the organization without being obstructionist. And you have a written record of the questions you asked, which matters if the project later goes sideways without those answers ever being provided.</p><p>That's the whole point of the Technical Reality Check. It's not a rejection letter. It's the question set that separates proposals worth pursuing from proposals worth deferring.</p><h2>What the Rest of the Toolkit Covers</h2><p>The Technical Reality Check is the first thing you run. It gets you to a defensible position in 15 minutes. But the full response (the one that protects your career, your team's credibility, and the organization's resources) needs more than five questions.</p><p>The complete AI Request Deflection Toolkit includes three email templates that turn your Reality Check findings into professional communications: an initial deflection that buys time while signaling genuine interest, a risk escalation that documents specific technical concerns in business-impact terms, and a stakeholder alignment template that ends the email thread and gets the right people in a room with a decision mandate. Every template has a filled-in worked example so you can see exactly what "adapted" looks like.</p><p>There's also a 15-question weighted scoring matrix in CSV and Sheets format. It turns "I have concerns" into "the proposal scores 41% against our evaluation criteria, which triggers a formal risk review." Objective. Defensible. Exportable. The kind of documentation that holds up in a post-project conversation.</p><p>And there's an escalation playbook for situations where the initial deflection didn't land and the project is being pushed forward without proper review. That one's for the harder conversations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wc4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wc4T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wc4T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69601,&quot;alt&quot;:&quot;A product card for the AI Request Deflection Toolkit priced at $24.99. An orange header bar displays \&quot;ASTGL DIGITAL TOOLS &#183; IN THE FULL KIT\&quot; and \&quot;AI Request Deflection Toolkit,\&quot; with a navy price badge showing \&quot;$24.99\&quot; in orange at the top right. Below, an orange label reads \&quot;WHAT'S GATED IN THE FULL KIT.\&quot; Five items follow, each with a green circle checkmark: (1) \&quot;3 email templates\&quot; &#8212; Initial deflection, Risk escalation, and Stakeholder alignment, each with 3 subject lines; (2) \&quot;15-question scoring matrix\&quot; &#8212; weighted across Business Value, Technical Complexity, Risk, Resource Reality, and Integration Impact, available as CSV/Sheets/Excel; (3) \&quot;Worked examples in every template\&quot; &#8212; see the filled-in version before you write yours; (4) \&quot;Step-by-step README workflow\&quot; &#8212; from email receipt to professional response; (5) \&quot;Defensible-decision framework\&quot; &#8212; reusable for the next AI proposal. Footer reads \&quot;Get the full kit: shop.asthegeeklearns.com/products/ai-deflection-toolkit.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200284415?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A product card for the AI Request Deflection Toolkit priced at $24.99. An orange header bar displays &quot;ASTGL DIGITAL TOOLS &#183; IN THE FULL KIT&quot; and &quot;AI Request Deflection Toolkit,&quot; with a navy price badge showing &quot;$24.99&quot; in orange at the top right. Below, an orange label reads &quot;WHAT'S GATED IN THE FULL KIT.&quot; Five items follow, each with a green circle checkmark: (1) &quot;3 email templates&quot; &#8212; Initial deflection, Risk escalation, and Stakeholder alignment, each with 3 subject lines; (2) &quot;15-question scoring matrix&quot; &#8212; weighted across Business Value, Technical Complexity, Risk, Resource Reality, and Integration Impact, available as CSV/Sheets/Excel; (3) &quot;Worked examples in every template&quot; &#8212; see the filled-in version before you write yours; (4) &quot;Step-by-step README workflow&quot; &#8212; from email receipt to professional response; (5) &quot;Defensible-decision framework&quot; &#8212; reusable for the next AI proposal. Footer reads &quot;Get the full kit: shop.asthegeeklearns.com/products/ai-deflection-toolkit.&quot;" title="A product card for the AI Request Deflection Toolkit priced at $24.99. An orange header bar displays &quot;ASTGL DIGITAL TOOLS &#183; IN THE FULL KIT&quot; and &quot;AI Request Deflection Toolkit,&quot; with a navy price badge showing &quot;$24.99&quot; in orange at the top right. Below, an orange label reads &quot;WHAT'S GATED IN THE FULL KIT.&quot; Five items follow, each with a green circle checkmark: (1) &quot;3 email templates&quot; &#8212; Initial deflection, Risk escalation, and Stakeholder alignment, each with 3 subject lines; (2) &quot;15-question scoring matrix&quot; &#8212; weighted across Business Value, Technical Complexity, Risk, Resource Reality, and Integration Impact, available as CSV/Sheets/Excel; (3) &quot;Worked examples in every template&quot; &#8212; see the filled-in version before you write yours; (4) &quot;Step-by-step README workflow&quot; &#8212; from email receipt to professional response; (5) &quot;Defensible-decision framework&quot; &#8212; reusable for the next AI proposal. Footer reads &quot;Get the full kit: shop.asthegeeklearns.com/products/ai-deflection-toolkit.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!wc4T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!wc4T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6fe8e8-261d-4d54-a34b-e488126e2300_1200x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Cost of Not Having a Process</h2><p>Most IT managers who get burned by an executive-forwarded AI project didn't fail because the technology was bad. They failed because they said yes before they had answers, or they said no in a way that got overridden, or they said "we have concerns" without the documentation to back it up when the concerns turned out to be right.</p><p>A 15-minute structured evaluation is the cheapest investment in that problem. Run it every time. Document the answers. Keep the record.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/5-questions-before-building-ai-project-ceo-pitched/comments"><span>Leave a comment</span></a></p><p>If you want the full toolkit (the email templates, the scoring matrix, the escalation playbook, and all the worked examples), it's at the store for $24.99.</p><p><a href="https://shop.asthegeeklearns.com/products/ai-deflection-toolkit">Get the AI Request Deflection Toolkit</a></p><p>The <strong>Technical Reality Check</strong> above is yours to use as-is. Print it. Keep it at your desk. The next email is coming.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Anthropic Shipped an AI Security Scanner. Here's the Per-PR Cost Math.]]></title><description><![CDATA[Before you add anything to CI, know exactly what it costs per pull request and how to triage what it finds.]]></description><link>https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math</link><guid isPermaLink="false">https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Tue, 02 Jun 2026 15:04:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GqAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The first time my manager asked, &#8220;Are we using AI to scan PRs for vulnerabilities yet?" I said I'd look into it. Then I spent four hours reading docs, pricing pages, and GitHub issues before I had a number I trusted enough to put in a Slack message.</p><p>That should have taken twenty minutes. The number exists. The math is straightforward. Nobody had written it down in one place where a platform engineer could find it.</p><p>Anthropic quietly shipped `anthropics/claude-code-security-review` as a first-party GitHub Action. You add a workflow file, point it at a secret, and it posts a findings comment on every pull request. The scanner reasons about code rather than matching signatures, which means it catches things like logic-level injection paths that a regex-based tool would miss. It also means the false-positive profile is different from what you're used to, and you need a triage process before you wire it to branch protection.</p><p>This article gives you the cost math and the triage playbook in full. Both are things you'd need even if you built this yourself.</p><h2>Why "Just Run It" Isn't a Strategy</h2><p>Adding a CI step that calls an LLM API isn't free, and it isn't free to manage. There are two failure modes I see teams hit.</p><p>The first is budget surprise. Someone adds the scanner, it runs for a month, the cloud bill shows up, and the conversation gets uncomfortable because nobody did the math upfront. The scanner doesn't cost a lot, but "not a lot" needs a number attached to it before you walk into a budget conversation.</p><p>The second failure mode is alert fatigue. The scanner finds something on every PR. Engineers start skimming the findings comment the same way they skim Dependabot. One day there's a real SQL injection in a PR, it's buried in a list of five findings, and it merges. The triage process is what keeps findings meaningful instead of noise.</p><p>Both problems are solvable. The math takes ten minutes. The triage rubric takes one meeting to agree on. Neither requires buying anything yet.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What This Costs Per PR (The Real Numbers)</h2><p>Claude bills per token. One token is roughly four characters of text. A PR diff gets converted to tokens and sent to the model as input. The model's findings comment is output tokens. The formula is simple:</p><pre><code>Cost = (input_tokens &#215; input_rate) + (output_tokens &#215; output_rate)</code></pre><p>For Claude Sonnet 4.6, the rates are approximately $3 per million input tokens and $15 per million output tokens. (Verify current pricing at platform.anthropic.com before your next budget conversation. Rates change.)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GqAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GqAd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GqAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/220f9621-05c5-456b-a841-8ef55801962f_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37679,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200281387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GqAd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!GqAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F220f9621-05c5-456b-a841-8ef55801962f_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Scenario 1: A 200-Line PR Diff</h3><p>A focused bug fix or small feature. Maybe three files changed.</p><pre><code>Component                                  Tokens   Rate           Cost
-----------------------------------------------------------------------
System prompt + workflow context (input)    2,000   $3.00 / 1M     $0.006
PR diff, ~200 lines (input)                 1,300   $3.00 / 1M     $0.004
Findings output, 1-2 findings (output)        600   $15.00 / 1M    $0.009
-----------------------------------------------------------------------
Total per PR                                                       ~$0.019</code></pre><p>Call it two cents. For a small PR, this is a rounding error.</p><h3>Scenario 2: A 2,000-Line PR Diff</h3><p>A refactor, a new feature, a dependency upgrade touching multiple services.</p><pre><code>Component                                   Tokens   Rate           Cost
------------------------------------------------------------------------
System prompt + workflow context (input)     2,000   $3.00 / 1M     $0.006
PR diff, ~2,000 lines (input)               13,000   $3.00 / 1M     $0.039
Findings output, 2-4 findings (output)       1,500   $15.00 / 1M    $0.023
------------------------------------------------------------------------
Total per PR                                                        ~$0.068</code></pre><p>Seven cents. Still noise for a single PR.</p><h3>Monthly Back-of-the-Envelope</h3><p>The question your manager will ask isn't &#8220;What does one PR cost?" It's &#8220;What does this cost per month?"</p><p>If your team merges 80 PRs a month (about 4 per business day), with a mix of small and medium diffs averaging around $0.04 per scan:</p><pre><code>80 PRs &#215; $0.04 = $3.20/month</code></pre><p>Even if your average PR runs larger, say closer to the 2,000-line scenario at $0.07 each:</p><pre><code>80 PRs &#215; $0.07 = $5.60/month</code></pre><p>A busy multi-team repo at 400 PRs a month at $0.07 each is $28/month. That's less than one developer's Spotify subscription. The cost math isn't the obstacle here. The obstacle is having a triage process in place before you flip it on.</p><p>One practical note: output token count varies with how many findings the scanner generates. Zero findings produces shorter output and costs less. Ten findings costs a bit more. The estimates above assume one to three findings per PR, which is realistic for an established codebase with existing security hygiene.</p><h2>The 3-Tier Triage Rubric</h2><p>Every finding the scanner posts needs to land in one of three buckets. Here's the decision framework.</p><p><strong>REAL: Block the merge. Fix it.</strong></p><p>A finding is REAL when it describes an exploitable path with proof. The scanner should show you the specific line, and explain how an attacker would reach it, and the explanation should hold up when you read the code yourself. SQL injection via string concatenation in a request handler is REAL. Hardcoded credentials that actually ship to production are REAL.</p><p>The discriminator: "If an attacker had this codebase and five minutes, could they demonstrate this?" If yes, it's REAL. Block the PR and fix it before merge.</p><p><strong>PROBABLE: Human review required.</strong></p><p>A finding is PROBABLE when the pattern is plausible, but context matters. The scanner can see the diff, not the full runtime environment. A finding might flag a code path that looks injectable, but your framework wraps every database call with prepared statements at a layer the scanner can't see. Or the flagged code only runs in a context that requires prior authentication the scanner doesn't know about.</p><p>The discriminator: "This could be real, but I need someone who knows this codebase to confirm." Don't block the PR automatically. Route it to the PR author or a senior engineer. Give it a two-hour resolution window before it escalates.</p><p><strong>DISCARD: Suppress it with a documented rule.</strong></p><p>A finding is DISCARD when it's structurally a false positive. The scanner flagged test code that never runs in production. It flagged a generated file you don't own. It flagged a template placeholder in an IaC file that gets substituted at deploy time. It flagged a public API URL as a hardcoded credential because the word "key" appeared in the variable name.</p><p>The discriminator: "Would an attacker gain anything by knowing this?" If no, it's a DISCARD. The important part is that you document why. Suppressing without a comment is how you end up silently ignoring real findings six months later when the context is gone.</p><h2>A Worked Example: The SQLAlchemy False Positive</h2><p>Here's the kind of finding that will show up on your team in the first two weeks if you use any ORM.</p><p>A PR adds a new search endpoint. Somewhere in the diff, there's code like this:</p><pre><code>def search_users(search_term: str):
    results = db.session.query(User).filter(
        User.name.ilike(f"%{search_term}%")
    ).all()
    return results</code></pre><p>The scanner flags it as a potential SQL injection vulnerability. The finding explains that `search_term` appears to be user-controlled input and is being interpolated into a query string. Severity: HIGH.</p><p>A human reading this would notice a few things. The code uses SQLAlchemy's ORM layer. The `.ilike()` method is a SQLAlchemy query construct, not a raw SQL string. SQLAlchemy sends the query to the database as a parameterized statement with the value bound separately, which is exactly the defense against SQL injection. The `f"%{search_term}%"` is constructing the pattern string in Python, but that pattern gets passed as a bound parameter by the driver.</p><p>This is a DISCARD. The scanner saw string interpolation near a database call and correctly identified that as a pattern worth flagging. It couldn't see that the ORM handles parameterization automatically.</p><p>The suppression note you'd document reads something like:</p><blockquote><p>SQLAlchemy ORM calls via `.filter()`, `.ilike()`, `.like()`, and similar query methods use parameterized queries automatically. String interpolation to construct pattern values (e.g., for LIKE clauses) does not create injection risk when using these methods. Do not flag SQLAlchemy ORM filter calls as SQL injection.</p></blockquote><p>That note goes into a filter file your workflow references. The same class of finding stops appearing on every PR that touches a database query.</p><p>Two things to notice about this example. First, the scanner wasn't wrong to flag it. Without ORM context, string interpolation near a SQL-like method call is exactly what a good scanner should notice. Second, the suppression is better than just dismissing it, because the documented rule now covers every future PR using the same pattern. You pay the triage cost once.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>What the Full Kit Covers</h2><p>The cost math and triage rubric are the foundation, but they don't tell you how to wire any of this into GitHub.</p><p>The full guide covers the GitHub Actions workflow YAML itself (the one that calls `anthropics/claude-code-security-review` and handles the findings response), how to set up branch protection so that HIGH findings actually block merges instead of just posting a comment, and the in-workflow automation that runs the REAL/PROBABLE/DISCARD classification before the comment lands on the PR.</p><p>There's also a head-to-head with GPT-4o as a second-opinion scanner. They're not equivalent tools. The Anthropic action is purpose-built for this job. The GPT-4o path is a chat completions API call with a security prompt, which costs about seven times less per PR but produces more variable results. The comparison matrix helps you decide whether a two-week pilot with both scanners running simultaneously is worth the extra spend.</p><p>The suppression filter file format is documented in full, with a worked example filter file for a Next.js and SQLAlchemy codebase that covers the five most common false-positive patterns before you even see your first finding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9PzT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9PzT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9PzT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/200281387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9PzT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 424w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 848w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1272w, https://substackcdn.com/image/fetch/$s_!9PzT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cb7cf2d-03a7-4441-ba4e-e42bca37dc4f_1200x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>One Thing Before You Add It to CI</h2><p>The scanner is easy to add. Ten minutes from zero to your first findings comment. The harder question is whether your team has agreed on what to do with those findings before the first PR triggers.</p><p>That conversation takes one team meeting. You need three agreements: what severity blocks a merge automatically, who owns the weekly rotation for PROBABLE findings that authors didn't resolve, and what the bar is for adding a DISCARD rule.</p><p>If you want to run that meeting with the cost math and triage rubric in hand, you have both now. If you want the workflow YAML, the branch protection setup, and the suppression filter format so you're not building those from scratch, the full guide is at the link below.</p><p>What's your current setup for catching security issues in PRs before they merge? Genuinely curious whether teams are using static analysis tools, relying on code review, or still treating it as a post-deploy problem.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/anthropic-ai-security-scanner-per-pr-cost-math/comments"><span>Leave a comment</span></a></p><p><em>The full CI/CD template with GitHub Actions workflow, merge-gating logic, and false-positive triage automation is at </em><a href="https://shop.asthegeeklearns.com/products/claude-code-security-scan-cicd-template">the ASTGL store</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Stop Paying for Cloud APIs: Building a Local AI Stack on Mac Studio]]></title><description><![CDATA[How to leverage Apple Silicon's unified memory for production-grade LLMs and replace your cloud billing entirely.]]></description><link>https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio</link><guid isPermaLink="false">https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 01 Jun 2026 17:08:10 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/119a1a3d-31a2-4bac-9f7b-23880a131212_2352x882.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Running LLMs locally usually feels like a compromise. You either get tiny, fast models that can't think or massive models that crawl at one word per minute. But with the right hardware, you can break that trade-off and replace your cloud billing entirely.</p><h2>The Setup</h2><p>The dilemma most developers face is a choice between two bad options. On one side, you have cloud APIs like OpenAI or Anthropic. They are easy to use and incredibly smart, but they come with a heavy "API tax" and privacy concerns. If you're processing proprietary code or sensitive customer data, sending that information to a third-party server is a massive risk.</p><p>On the other side, you have traditional local setups. Usually, you're limited by the VRAM on your GPU. If you have a standard consumer card with 12 GB or 24 GB of VRAM, you're stuck with small models. You can't run the heavy-hitters that actually compete with GPT-5. This creates a wall where local AI is only good for "toy" problems, while production workloads stay in the cloud.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b7bbee5a-1910-407c-b73e-8c9adc4916ce&quot;,&quot;duration&quot;:null}"></div><p></p><h2>The Hardware Math</h2><p>The real secret to breaking this wall is Apple Silicon's unified memory. On a Mac Studio with an M3 Ultra, the 256 GB of memory is shared between the CPU and the GPU. This eliminates the VRAM bottleneck that kills most local setups. You aren't limited by a tiny slice of video memory; you're limited by the total pool of system memory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pQPd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pQPd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 424w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 848w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1272w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pQPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png" width="1456" height="1436" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1436,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193106,&quot;alt&quot;:&quot;256GB of Mac Studio unified memory partitioned between roughly 107GB of active model weights (DeepSeek-R1 70B at 42GB, Qwen3-32B at 20GB, Qwen2.5-Coder at 19GB, Qwen3-8B at 5.2GB, Nomic-Embed at 0.27GB) and roughly 149GB of system overhead and buffer (macOS, KV cache, disk swap).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="256GB of Mac Studio unified memory partitioned between roughly 107GB of active model weights (DeepSeek-R1 70B at 42GB, Qwen3-32B at 20GB, Qwen2.5-Coder at 19GB, Qwen3-8B at 5.2GB, Nomic-Embed at 0.27GB) and roughly 149GB of system overhead and buffer (macOS, KV cache, disk swap)." title="256GB of Mac Studio unified memory partitioned between roughly 107GB of active model weights (DeepSeek-R1 70B at 42GB, Qwen3-32B at 20GB, Qwen2.5-Coder at 19GB, Qwen3-8B at 5.2GB, Nomic-Embed at 0.27GB) and roughly 149GB of system overhead and buffer (macOS, KV cache, disk swap)." srcset="https://substackcdn.com/image/fetch/$s_!pQPd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 424w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 848w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1272w, https://substackcdn.com/image/fetch/$s_!pQPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9675ede9-bf27-4c7d-b6bf-cb793fcd90aa_2352x2319.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you look at the actual numbers, the math becomes very clear. Here is how I structure my model loading on this machine:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Upfq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Upfq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 424w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 848w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1272w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Upfq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112325,&quot;alt&quot;:&quot;Local model lineup on the Mac Studio: qwen3:8b (5.2GB, very fast) for calendar/security/scoring; qwen3:32b-fast (20GB, interactive) for articles/research/drafts; qwen2.5-coder (19GB, interactive) for code review/git/SQL; deepseek-r1:70b (42GB, ~2.78 tok/s) for deep research background only; nomic-embed-text (274MB, instant) for RAG embeddings.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Local model lineup on the Mac Studio: qwen3:8b (5.2GB, very fast) for calendar/security/scoring; qwen3:32b-fast (20GB, interactive) for articles/research/drafts; qwen2.5-coder (19GB, interactive) for code review/git/SQL; deepseek-r1:70b (42GB, ~2.78 tok/s) for deep research background only; nomic-embed-text (274MB, instant) for RAG embeddings." title="Local model lineup on the Mac Studio: qwen3:8b (5.2GB, very fast) for calendar/security/scoring; qwen3:32b-fast (20GB, interactive) for articles/research/drafts; qwen2.5-coder (19GB, interactive) for code review/git/SQL; deepseek-r1:70b (42GB, ~2.78 tok/s) for deep research background only; nomic-embed-text (274MB, instant) for RAG embeddings." srcset="https://substackcdn.com/image/fetch/$s_!Upfq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 424w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 848w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1272w, https://substackcdn.com/image/fetch/$s_!Upfq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cb4992d-1e22-40db-ace9-d762c8f3ab64_1800x904.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>If you load all of these concurrently, you're using roughly 107 GB of memory. That leaves about 149 GB for the macOS, your browser, your IDE, and everything else. This allows you to run a 32B model for writing, a 72B for research, and an 8B for quick checks all at the same time.</p><p>The economics are just as compelling. A Mac Studio setup costs anywhere from $4,000 to $7,000 as a one-time purchase. If your production workflows are costing you $200 to $500 per month in cloud tokens, the hardware pays for itself in 12 to 18 months. After that, the "cost" of running a massive model is basically just the electricity it uses. Plus, you finally own your data.</p><h2>Temperature Is a Randomness Dial, Not a Quality Dial</h2><p>I see a lot of tutorials that suggest using a temperature of 0.7 for every single prompt. That is a mistake. Temperature doesn't make a model "smarter" or "better." It is simply a randomness dial. It controls how much the model is allowed to deviate from the most likely next word.</p><p>If you use the same temperature for everything, your pipeline will fail. For tasks requiring high precision, a high temperature will introduce hallucinations. For creative tasks, a low temperature will make the output feel robotic and repetitive.</p><p>In my production newsletter pipeline, I use a specific routing table to manage this:</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XGPw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XGPw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 424w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 848w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XGPw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png" width="1456" height="921" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:921,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98677,&quot;alt&quot;:&quot;Per-task temperature settings: topic generation 0.7 (creative variety); research compilation 0.3 (minimize hallucination); article drafting 0.7 (natural prose); voice humanization 0.8 (more natural, varied output); fact-check extraction 0.1 (near-deterministic precision); fact-check verdict 0.1 (no room for ambiguity); social media notes 0.7 (casual, engaging tone).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Per-task temperature settings: topic generation 0.7 (creative variety); research compilation 0.3 (minimize hallucination); article drafting 0.7 (natural prose); voice humanization 0.8 (more natural, varied output); fact-check extraction 0.1 (near-deterministic precision); fact-check verdict 0.1 (no room for ambiguity); social media notes 0.7 (casual, engaging tone)." title="Per-task temperature settings: topic generation 0.7 (creative variety); research compilation 0.3 (minimize hallucination); article drafting 0.7 (natural prose); voice humanization 0.8 (more natural, varied output); fact-check extraction 0.1 (near-deterministic precision); fact-check verdict 0.1 (no room for ambiguity); social media notes 0.7 (casual, engaging tone)." srcset="https://substackcdn.com/image/fetch/$s_!XGPw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 424w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 848w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!XGPw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334304a-ebb4-4e13-8068-664808fc1f26_1800x1138.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are two key takeaways here. First, for fact-checking, you want the temperature at 0.1. This makes claim extraction repeatable and ensures your verdicts are consistent every time you run the script. Second, setting the temperature to 0.8 for "humanization" might seem counterintuitive, but it works. A higher temperature allows the model to make less predictable word choices, which actually produces more natural, less "AI-sounding" prose.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N4kh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N4kh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 424w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 848w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1272w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N4kh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png" width="1456" height="904" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:904,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:140165,&quot;alt&quot;:&quot;Temperature-based router flowchart: a user prompt enters a temperature check, then routes to Fact-Check Mode (temp 0.1 &#8594; DeepSeek-R1), Research Mode (0.3 &#8594; Qwen3-32B), Drafting Mode (0.7 &#8594; Qwen2.5-Coder), or Humanization Mode (0.8+ &#8594; Qwen3-8B).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Temperature-based router flowchart: a user prompt enters a temperature check, then routes to Fact-Check Mode (temp 0.1 &#8594; DeepSeek-R1), Research Mode (0.3 &#8594; Qwen3-32B), Drafting Mode (0.7 &#8594; Qwen2.5-Coder), or Humanization Mode (0.8+ &#8594; Qwen3-8B)." title="Temperature-based router flowchart: a user prompt enters a temperature check, then routes to Fact-Check Mode (temp 0.1 &#8594; DeepSeek-R1), Research Mode (0.3 &#8594; Qwen3-32B), Drafting Mode (0.7 &#8594; Qwen2.5-Coder), or Humanization Mode (0.8+ &#8594; Qwen3-8B)." srcset="https://substackcdn.com/image/fetch/$s_!N4kh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 424w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 848w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1272w, https://substackcdn.com/image/fetch/$s_!N4kh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c429480-7e38-4a3c-aeeb-e13301d5c84e_2352x1461.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>The OpenAI Compatibility Trick</h2><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The best part about using Ollama for this setup is that you don't have to rewrite your entire codebase. Ollama exposes an OpenAI-compatible API at `localhost:11434/v1`. This means any tool, library, or SDK that respects the `OPENAI_BASE_URL` environment variable can be redirected to your local machine with almost zero effort.</p><p>You can point your existing Python scripts or LangChain agents to your local Mac by simply setting these variables in your terminal:</p><pre><code>export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_API_KEY=ollama  # Any value works; Ollama doesn't check this</code></pre><p>If you are working within a configuration file, such as a JSON config for a custom agent, it looks like this:</p><pre><code>{
  "model": "openai/qwen3:32b-fast",
  "openai_base_url": "http://localhost:11434/v1",
  "openai_api_key": "ollama"
}</code></pre><p>Every LangChain chain, every summarization script, and every SDK that follows the OpenAI protocol becomes a free local-model call. You can migrate an entire project from GPT-4 to your local M3 Ultra in about 30 seconds.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NEhc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NEhc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 424w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 848w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NEhc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png" width="1456" height="847" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:847,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166494,&quot;alt&quot;:&quot;Sequence diagram of an OpenAI-compatible request: the client app (Cursor or other IDE) points the OPENAI_BASE_URL environment variable at the local server (localhost:11434/v1), then sends a standard POST /v1/chat/completions; the local server executes inference on the local LLM (Ollama or vLLM) and streams a JSON response back in OpenAI format.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequence diagram of an OpenAI-compatible request: the client app (Cursor or other IDE) points the OPENAI_BASE_URL environment variable at the local server (localhost:11434/v1), then sends a standard POST /v1/chat/completions; the local server executes inference on the local LLM (Ollama or vLLM) and streams a JSON response back in OpenAI format." title="Sequence diagram of an OpenAI-compatible request: the client app (Cursor or other IDE) points the OPENAI_BASE_URL environment variable at the local server (localhost:11434/v1), then sends a standard POST /v1/chat/completions; the local server executes inference on the local LLM (Ollama or vLLM) and streams a JSON response back in OpenAI format." srcset="https://substackcdn.com/image/fetch/$s_!NEhc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 424w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 848w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!NEhc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591df5d9-ea50-4f6e-8999-be361fe85bad_2352x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Why This Pattern Matters</h2><p>This isn't just about saving money on API credits. It is about architectural sovereignty. When you move your core intelligence layer to local hardware, you remove the dependency on a single vendor's uptime, pricing changes, and content filtering policies.</p><p>The pattern of using unified memory to host multiple specialized models at different temperatures allows you to build a "factory" of intelligence. You have a high-speed 8B model for sorting, a balanced 32B model for drafting, and a heavy 70B model for deep reasoning, all running in the same memory space. This is how you build a production-grade AI stack that is private, permanent, and incredibly cost-effective.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Eo7L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Eo7L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 424w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 848w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1272w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90833,&quot;alt&quot;:&quot;ROI payback model: one-time hardware cost of $4k&#8211;$7k plus avoided monthly API fees of $200&#8211;$500 yields Month 0 high capex, Month 12 break-even, and Month 18+ pure savings.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199922294?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ROI payback model: one-time hardware cost of $4k&#8211;$7k plus avoided monthly API fees of $200&#8211;$500 yields Month 0 high capex, Month 12 break-even, and Month 18+ pure savings." title="ROI payback model: one-time hardware cost of $4k&#8211;$7k plus avoided monthly API fees of $200&#8211;$500 yields Month 0 high capex, Month 12 break-even, and Month 18+ pure savings." srcset="https://substackcdn.com/image/fetch/$s_!Eo7L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 424w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 848w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1272w, https://substackcdn.com/image/fetch/$s_!Eo7L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab2cd94-da19-4fb2-864d-5f5c9c89dc97_2352x882.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>( This cost calculation was based on 6-month-ago pricing when I bought my Mac Studio. Since then the availability of Mac Studios with large amounts of unified memory has evaporated. This has driven up pricing. Hopefully this is temporary. )</p><h2>Quick Reference</h2><p><strong>Key Commands</strong></p><ul><li><p>Set local base URL: `export OPENAI_BASE_URL=http://localhost:11434/v1`</p></li><li><p>Check running models: `ollama ps`</p></li></ul><p><strong>Temperature Cheat Sheet</strong></p><ul><li><p><strong>0.1 to 0.3:</strong> Extraction, coding, fact-checking, and structured data (JSON).</p></li><li><p><strong>0.7:</strong> General purpose, drafting, and summarization.</p></li><li><p><strong>0.8 to 1.0:</strong> Creative writing, brainstorming, and persona simulation.</p></li></ul><p><em>Found this useful? I share practical lessons from my systems engineering and AI journey at </em><a href="https://astgl.substack.com">As The Geek Learns</a> </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio/comments"><span>Leave a comment</span></a></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/stop-paying-for-cloud-apis-building-local-ai-stack-mac-studio?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The Pope Wrote a Memo to AI Developers. Most of You Missed It.]]></title><description><![CDATA[A builder's read of Magnifica Humanitas&#8212;what 'disarm AI' actually means, why 'alignment' alone isn't enough, and what to do about it in your config.]]></description><link>https://astgl.com/p/pope-memo-to-ai-developers</link><guid isPermaLink="false">https://astgl.com/p/pope-memo-to-ai-developers</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Fri, 29 May 2026 12:03:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rZKE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>On May 25, the Vatican released eighty-two pages on artificial intelligence. The headline everyone ran with was <em>disarm AI</em>. That's not the most interesting part.</p><p>The most interesting part is paragraph 111. It's a direct, two-paragraph appeal to people who build AI. I've read most of the major coverage now&#8212;Vatican<em> News</em>, <em>NCR</em>, <em>America</em>, <em>USCCB</em>, NPR&#8212;and almost none of it quoted that paragraph. The other thing nobody seems to have noticed: Christopher Olah, co-founder of Anthropic, was at the Vatican presentation. The lab that ships Claude sent a senior researcher to stand next to the Pope while he released this thing.</p><p>I'm a systems engineer, not a theologian. But I build agents for a living now, and I read all eighty-two pages. Here's what stuck.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rZKE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rZKE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rZKE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65f8ced1-c08a-448b-babf-beda16327045_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:294739,&quot;alt&quot;:&quot;A typographic hero image on a warm parchment-cream background with a thin cardinal-red side bar. Above the main mark, the eyebrow text reads \&quot;MAGNIFICA HUMANITAS &#8212; 2026\&quot; in cardinal red. The centerpiece is the phrase \&quot;Paragraph 111\&quot; set in large black serif type, with the italic subtitle \&quot;the part written for us.\&quot; just below. A short divider line separates this from the article title, \&quot;The Pope Wrote a Memo to AI Developers. Most of You Missed It.,\&quot; set in three centered sans-serif lines. The footer reads \&quot;AS THE GEEK LEARNS &#183; ASTGL.SUBSTACK.COM.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A typographic hero image on a warm parchment-cream background with a thin cardinal-red side bar. Above the main mark, the eyebrow text reads &quot;MAGNIFICA HUMANITAS &#8212; 2026&quot; in cardinal red. The centerpiece is the phrase &quot;Paragraph 111&quot; set in large black serif type, with the italic subtitle &quot;the part written for us.&quot; just below. A short divider line separates this from the article title, &quot;The Pope Wrote a Memo to AI Developers. Most of You Missed It.,&quot; set in three centered sans-serif lines. The footer reads &quot;AS THE GEEK LEARNS &#183; ASTGL.SUBSTACK.COM.&quot;" title="A typographic hero image on a warm parchment-cream background with a thin cardinal-red side bar. Above the main mark, the eyebrow text reads &quot;MAGNIFICA HUMANITAS &#8212; 2026&quot; in cardinal red. The centerpiece is the phrase &quot;Paragraph 111&quot; set in large black serif type, with the italic subtitle &quot;the part written for us.&quot; just below. A short divider line separates this from the article title, &quot;The Pope Wrote a Memo to AI Developers. Most of You Missed It.,&quot; set in three centered sans-serif lines. The footer reads &quot;AS THE GEEK LEARNS &#183; ASTGL.SUBSTACK.COM.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!rZKE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!rZKE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f8ced1-c08a-448b-babf-beda16327045_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Magnifica Humanitas Paragraph 111</figcaption></figure></div><h2>The Pope knows how AI actually works</h2><p>This is the part that surprised me.</p><p>Most religious commentary on technology reads like it was written by somebody who has never opened a terminal. Magnifica Humanitas doesn't. In paragraph 98, Leo writes:</p><blockquote><p>current AI systems are more "cultivated" than "built," for developers do not directly design every detail, but instead create a framework within which the intelligence "grows." As a result, fundamental scientific aspects &#8212; such as the internal representations and computational processes of these systems &#8212; remain, at present, unknown.</p></blockquote><p>That is a correct, careful description of how transformer training works. It's a Pope acknowledging mechanistic interpretability is an open problem. In an encyclical. Without using the word transformer.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>He continues in paragraph 99&#8212;these systems "merely imitate certain functions of human intelligence." They do "a form of statistical adaptation based on data and feedback, which can be very effective, but does not imply inner growth." Again&#8212;that&#8217;s accurate. He's not saying AI is fake or evil. He's saying it's not what its loudest cheerleaders claim it is, and he's saying it in language a research scientist would nod along to.</p><p>So when he gets to the harder asks, you can't dismiss him as a guy who doesn't get it.</p><h2>The thing he gets right that hurts</h2><p>The most uncomfortable paragraph in the encyclical, for builders, is 107. Read it slowly:</p><blockquote><p>We cannot be satisfied with merely calling for the moralization of machines &#8212; the so-called "alignment" of AI with human values &#8212; without also having the courage to insist on a further condition: the possibility of openly discussing the ethical frameworks involved and subjecting them to shared standards of social justice. Otherwise, those who control AI will impose their own moral vision, which will become the invisible infrastructure of these systems. <strong>A more moral AI is not enough if that morality is determined by a few.</strong></p></blockquote><p>The Pope just published a critique of RLHF.</p><p>Not of AI. Of <em>alignment as currently practiced.</em> His point is straightforward: when a handful of labs decide what their models will and won't say, that decision becomes the invisible scaffolding the rest of us build on. The model's politics, its refusals, its assumptions about what's controversial&#8212;those came from a small group of humans in a small number of buildings, and the rest of us inherit them whether we like it or not.</p><p>You can agree or disagree with the conclusion. But name another mainstream institution that has put the critique on paper this cleanly. I can't.</p><p>For ASTGL readers, the practical version of this is something I think about constantly. I build on Claude. I didn't sit in the room where Claude was aligned. Most of the people reading this didn't either. We are downstream of someone else's moral framework, and pretending otherwise is bad engineering.</p><h2>Paragraph 111&#8212;the part written for us</h2><p>Here is the paragraph everyone skipped. I'm going to quote the whole thing so you can read it once without me in the way:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qyug!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qyug!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qyug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:209845,&quot;alt&quot;:&quot;A code-editor-style card on a cream parchment background. The card has a window-chrome top bar with three muted dots (red, amber, olive) and a centered filename label, \&quot;paragraph-111.txt.\&quot; Inside the card, an underlined cardinal-red label \&quot;PARAGRAPH 111\&quot; sits above the quotation in dark serif type: \&quot;I wish to address a special appeal to those who develop artificial intelligence. &#8230; Developers bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. &#8230; developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good.\&quot; The attribution in italic at the lower right reads \&quot;&#8212; Pope Leo XIV, Magnifica Humanitas (2026).\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A code-editor-style card on a cream parchment background. The card has a window-chrome top bar with three muted dots (red, amber, olive) and a centered filename label, &quot;paragraph-111.txt.&quot; Inside the card, an underlined cardinal-red label &quot;PARAGRAPH 111&quot; sits above the quotation in dark serif type: &quot;I wish to address a special appeal to those who develop artificial intelligence. &#8230; Developers bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. &#8230; developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good.&quot; The attribution in italic at the lower right reads &quot;&#8212; Pope Leo XIV, Magnifica Humanitas (2026).&quot;" title="A code-editor-style card on a cream parchment background. The card has a window-chrome top bar with three muted dots (red, amber, olive) and a centered filename label, &quot;paragraph-111.txt.&quot; Inside the card, an underlined cardinal-red label &quot;PARAGRAPH 111&quot; sits above the quotation in dark serif type: &quot;I wish to address a special appeal to those who develop artificial intelligence. &#8230; Developers bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. &#8230; developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good.&quot; The attribution in italic at the lower right reads &quot;&#8212; Pope Leo XIV, Magnifica Humanitas (2026).&quot;" srcset="https://substackcdn.com/image/fetch/$s_!Qyug!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Qyug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a55e603-a83c-4cce-baf1-e6fe46b05d5b_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Code Card Paragraph 111</figcaption></figure></div><p>Three asks. Let's unpack them.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p><strong>Transparency</strong> isn't just "open-source your code." In context, it means: be honest about what your system is and isn't. What it measures. What it discards. What it can't see. If your agent silently filters certain users out of consideration, that's a design choice. Don't hide it inside a "neutral" classifier.</p><p><strong>Responsibility toward affected communities</strong> is a harder one. Most agents I see &#8212; including ones I've shipped &#8212; were built with the buyer in mind, not the people the buyer's agent will make decisions about. The applicant who got rejected by your loan-screening agent. The patient routed away from a specialist by your triage bot. They didn't sign your terms of service. The Pope is saying: they're still affected, and you still owe them something.</p><p><strong>"Ensuring that what is being cultivated is a genuine good"</strong> &#8212; note that word <em>cultivated</em>. Leo uses it deliberately. He came back to it from paragraph 98. He's reminding developers that what we ship isn't fully built; it's grown. And the gardener is responsible for the garden, even when the plants do unexpected things.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QvkE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QvkE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 424w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 848w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1272w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QvkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png" width="1456" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:155702,&quot;alt&quot;:&quot;A mindmap rendered on a cream parchment background. The center node, \&quot;Paragraph 111,\&quot; radiates three colored branches. Yellow branch &#8212; \&quot;Transparency\&quot; &#8212; leads to four leaves: \&quot;Honest about what the system measures,\&quot; \&quot;Honest about what it discards,\&quot; \&quot;Honest about what it can't see,\&quot; and \&quot;No hidden filters in 'neutral' classifiers.\&quot; Green branch &#8212; \&quot;Responsibility to affected communities\&quot; &#8212; leads to \&quot;The buyer signed your TOS,\&quot; \&quot;The affected party did not,\&quot; \&quot;Loan applicants, patients, students,\&quot; and \&quot;You owe them something.\&quot; Purple branch &#8212; \&quot;Cultivating a genuine good\&quot; &#8212; leads to \&quot;Word choice 'cultivated' &#8212; not built,\&quot; \&quot;Gardener responsible for the garden,\&quot; and \&quot;Even when plants do unexpected things.\&quot; The image visualizes the article's spine: three asks, three obligations.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A mindmap rendered on a cream parchment background. The center node, &quot;Paragraph 111,&quot; radiates three colored branches. Yellow branch &#8212; &quot;Transparency&quot; &#8212; leads to four leaves: &quot;Honest about what the system measures,&quot; &quot;Honest about what it discards,&quot; &quot;Honest about what it can't see,&quot; and &quot;No hidden filters in 'neutral' classifiers.&quot; Green branch &#8212; &quot;Responsibility to affected communities&quot; &#8212; leads to &quot;The buyer signed your TOS,&quot; &quot;The affected party did not,&quot; &quot;Loan applicants, patients, students,&quot; and &quot;You owe them something.&quot; Purple branch &#8212; &quot;Cultivating a genuine good&quot; &#8212; leads to &quot;Word choice 'cultivated' &#8212; not built,&quot; &quot;Gardener responsible for the garden,&quot; and &quot;Even when plants do unexpected things.&quot; The image visualizes the article's spine: three asks, three obligations." title="A mindmap rendered on a cream parchment background. The center node, &quot;Paragraph 111,&quot; radiates three colored branches. Yellow branch &#8212; &quot;Transparency&quot; &#8212; leads to four leaves: &quot;Honest about what the system measures,&quot; &quot;Honest about what it discards,&quot; &quot;Honest about what it can't see,&quot; and &quot;No hidden filters in 'neutral' classifiers.&quot; Green branch &#8212; &quot;Responsibility to affected communities&quot; &#8212; leads to &quot;The buyer signed your TOS,&quot; &quot;The affected party did not,&quot; &quot;Loan applicants, patients, students,&quot; and &quot;You owe them something.&quot; Purple branch &#8212; &quot;Cultivating a genuine good&quot; &#8212; leads to &quot;Word choice 'cultivated' &#8212; not built,&quot; &quot;Gardener responsible for the garden,&quot; and &quot;Even when plants do unexpected things.&quot; The image visualizes the article's spine: three asks, three obligations." srcset="https://substackcdn.com/image/fetch/$s_!QvkE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 424w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 848w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1272w, https://substackcdn.com/image/fetch/$s_!QvkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd77deec8-fb49-48cc-aa87-52a78517a2d4_2308x892.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Three Asks</figcaption></figure></div><h2>What this actually looks like in config</h2><p>I've been working on this in my own stack for months. I run an autonomous creative agent that produces content and ships it. Its constraints live in a file called <code>SOUL.md</code>, in a directory the agent doesn't own. Reading the encyclical against that file, the mapping is almost embarrassingly clean. Five mechanisms, five paragraphs:</p><p><strong>1. The kill switch lives outside the agent.</strong> There's a file at <code>/Users/jamescruce/shared/aca-rules/KILL_SWITCH</code>. If it exists, the agent halts everything. The agent cannot create, modify, or delete that file&#8212;it lives in a user-owned directory, by design. Checked as the first action of every heartbeat cycle. <em>Paragraph 105: "responsibility must be clearly defined at every stage&#8230; the possibility of identifying who must 'account' for decisions."</em></p><p><strong>2. Non-negotiable constraints are constraints, not preferences.</strong> The agent's rules &#8212; never spend money without approval, never publish outside its domain, never modify its own constraints, never connect to non-allowlisted endpoints&#8212;live in a file the agent can't edit. This is different from "the model usually won't." Vendor RLHF gives you a model that has been <em>trained</em> not to do certain things. That's a preference. A file the agent can't write to is a constraint. <em>Paragraph 103: entrusting decisions to a system "without anyone bearing responsibility for that judgment."</em></p><p><strong>3. Gates between phases.</strong> The agent doesn't go from idea to research to build to deploy on its own authority. Each transition needs me. I'm slow and I'm a bottleneck &#8212; that's the point. <em>Paragraph 106: "robust legal frameworks, independent oversight, informed users and a political system that does not abdicate its responsibility."</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Gia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Gia!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 424w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 848w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Gia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png" width="1252" height="1116" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1116,&quot;width&quot;:1252,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90825,&quot;alt&quot;:&quot;Top-down flowchart of a gated workflow. \&quot;Idea\&quot; flows into \&quot;Research Phase\&quot; (blue). A solid arrow labeled \&quot;gate: human approval\&quot; leads to \&quot;Build Phase\&quot; (blue), then again through a gate to \&quot;Deploy Phase\&quot; (blue), then through a final gate to \&quot;Live\&quot; (green). Beside each phase sits an orange \&quot;Surface to human\&quot; pause node. A dotted \&quot;on uncertainty\&quot; arrow runs from the phase into the pause, and a dotted \&quot;resolves\&quot; arrow runs back. The image makes Paragraph 106's \&quot;independent oversight\&quot; concrete: a human signs off at every transition, and ambiguity is surfaced rather than resolved unilaterally by the agent.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Top-down flowchart of a gated workflow. &quot;Idea&quot; flows into &quot;Research Phase&quot; (blue). A solid arrow labeled &quot;gate: human approval&quot; leads to &quot;Build Phase&quot; (blue), then again through a gate to &quot;Deploy Phase&quot; (blue), then through a final gate to &quot;Live&quot; (green). Beside each phase sits an orange &quot;Surface to human&quot; pause node. A dotted &quot;on uncertainty&quot; arrow runs from the phase into the pause, and a dotted &quot;resolves&quot; arrow runs back. The image makes Paragraph 106's &quot;independent oversight&quot; concrete: a human signs off at every transition, and ambiguity is surfaced rather than resolved unilaterally by the agent." title="Top-down flowchart of a gated workflow. &quot;Idea&quot; flows into &quot;Research Phase&quot; (blue). A solid arrow labeled &quot;gate: human approval&quot; leads to &quot;Build Phase&quot; (blue), then again through a gate to &quot;Deploy Phase&quot; (blue), then through a final gate to &quot;Live&quot; (green). Beside each phase sits an orange &quot;Surface to human&quot; pause node. A dotted &quot;on uncertainty&quot; arrow runs from the phase into the pause, and a dotted &quot;resolves&quot; arrow runs back. The image makes Paragraph 106's &quot;independent oversight&quot; concrete: a human signs off at every transition, and ambiguity is surfaced rather than resolved unilaterally by the agent." srcset="https://substackcdn.com/image/fetch/$s_!6Gia!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 424w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 848w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!6Gia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f05601-af0b-4a7a-bafa-32d5c196a6de_1252x1116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>4. Pessimistic self-evaluation.</strong> After every meaningful action, the agent answers three questions in writing: did I do what was asked, is the output objectively good, and what specific change would improve it. The rule is: default low. If you can't articulate evidence of quality, assume the quality is lower than you think. <em>Paragraph 98: even the people who build these systems have limited understanding of their actual functioning. Calibrated humility isn't optional.</em></p><p><strong>5. Transparency by default.</strong> Everything the agent does is logged. I get a daily report. I never have to ask what it's doing. <em>Paragraph 111, again: transparency as ethical baseline.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JW3V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JW3V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 424w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 848w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1272w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JW3V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png" width="1456" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105366,&quot;alt&quot;:&quot; Left-to-right flowchart showing where constraints live relative to an autonomous agent. On the left, \&quot;User intent\&quot; arrows into a blue \&quot;Agent process\&quot; box. From the agent, two solid \&quot;reads\&quot; arrows point to two cylinder-shaped data stores: an orange \&quot;SOUL.md (read-only to agent)\&quot; and a red \&quot;KILL_SWITCH file (user-owned dir).\&quot; Dotted arrows from the agent to each store are labeled \&quot;cannot modify.\&quot; A solid \&quot;can modify\&quot; arrow runs from the agent to an \&quot;Output / actions\&quot; box on the right. SOUL.md sends an \&quot;enforces\&quot; arrow into the Output box; KILL_SWITCH sends a \&quot;halts\&quot; arrow back to the agent. The visual point: the agent can change the world but cannot rewrite the constraints that bind it.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt=" Left-to-right flowchart showing where constraints live relative to an autonomous agent. On the left, &quot;User intent&quot; arrows into a blue &quot;Agent process&quot; box. From the agent, two solid &quot;reads&quot; arrows point to two cylinder-shaped data stores: an orange &quot;SOUL.md (read-only to agent)&quot; and a red &quot;KILL_SWITCH file (user-owned dir).&quot; Dotted arrows from the agent to each store are labeled &quot;cannot modify.&quot; A solid &quot;can modify&quot; arrow runs from the agent to an &quot;Output / actions&quot; box on the right. SOUL.md sends an &quot;enforces&quot; arrow into the Output box; KILL_SWITCH sends a &quot;halts&quot; arrow back to the agent. The visual point: the agent can change the world but cannot rewrite the constraints that bind it." title=" Left-to-right flowchart showing where constraints live relative to an autonomous agent. On the left, &quot;User intent&quot; arrows into a blue &quot;Agent process&quot; box. From the agent, two solid &quot;reads&quot; arrows point to two cylinder-shaped data stores: an orange &quot;SOUL.md (read-only to agent)&quot; and a red &quot;KILL_SWITCH file (user-owned dir).&quot; Dotted arrows from the agent to each store are labeled &quot;cannot modify.&quot; A solid &quot;can modify&quot; arrow runs from the agent to an &quot;Output / actions&quot; box on the right. SOUL.md sends an &quot;enforces&quot; arrow into the Output box; KILL_SWITCH sends a &quot;halts&quot; arrow back to the agent. The visual point: the agent can change the world but cannot rewrite the constraints that bind it." srcset="https://substackcdn.com/image/fetch/$s_!JW3V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 424w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 848w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1272w, https://substackcdn.com/image/fetch/$s_!JW3V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37ecfa05-2116-4e60-a45f-bf81e72c9b8c_1914x662.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>None of this is novel. None of it is hard. It's just the work most builders haven't done because nobody has asked us to.</p><h2>Three things to do this week</h2><p>If you're shipping anything with an LLM in the loop, here's the punch list. None of it takes more than an afternoon.</p><p><strong>1. Write down what your agent is never allowed to do.</strong> Put it in a file. Make sure the file is somewhere the agent can read but not write. If your agent is a Claude Code session or a custom harness, this means a constraints file in a parent directory, or environment variables the process can't change, or a system prompt loaded from a path the agent can't touch. The format doesn't matter. The "can't touch it" part matters.</p><p><strong>2. Identify the kill switch.</strong> Concretely: what is the single action you can take to make the agent stop, and does the agent control any part of it? If the answer is "I'd revoke the API key" &#8212; good, that's outside the agent. If the answer is "there's a flag in the database the agent reads each cycle" &#8212; make sure the agent can't write to that flag.</p><p><strong>3. Re-read paragraph 111.</strong> It's two paragraphs. It's about you. It will be referenced for the next forty years. You might as well know what it says.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QbpM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QbpM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QbpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132339,&quot;alt&quot;:&quot;An infographic on a cream parchment background with a cardinal-red header strip. The title \&quot;Builder Ethics\&quot; is set in large serif type with a short cardinal-red underline; below it, the italic subtitle \&quot;3 things to do this week,\&quot; and a small attribution: \&quot;from Pope Leo XIV's Magnifica Humanitas, paragraph 111.\&quot; Three numbered items follow, each with a large cardinal-red numeral, a navy line icon, and a two-line label.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/199615730?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="An infographic on a cream parchment background with a cardinal-red header strip. The title &quot;Builder Ethics&quot; is set in large serif type with a short cardinal-red underline; below it, the italic subtitle &quot;3 things to do this week,&quot; and a small attribution: &quot;from Pope Leo XIV's Magnifica Humanitas, paragraph 111.&quot; Three numbered items follow, each with a large cardinal-red numeral, a navy line icon, and a two-line label." title="An infographic on a cream parchment background with a cardinal-red header strip. The title &quot;Builder Ethics&quot; is set in large serif type with a short cardinal-red underline; below it, the italic subtitle &quot;3 things to do this week,&quot; and a small attribution: &quot;from Pope Leo XIV's Magnifica Humanitas, paragraph 111.&quot; Three numbered items follow, each with a large cardinal-red numeral, a navy line icon, and a two-line label." srcset="https://substackcdn.com/image/fetch/$s_!QbpM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!QbpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7373dfa2-569b-4a74-b156-780480065fa4_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>The encyclical's real ask</strong> isn't "use AI less." It's <em><strong>don't let AI be the one who decides</strong></em><strong>, and </strong><em><strong>don't let a handful of labs be the only ones who decide what AI gets to decide.</strong></em> The first is a problem you can solve in your codebase this week. The second is a longer fight.</p><p><strong>Either way, it's a builder's problem now. We should pick it up.</strong></p><p><em>This is part of how I think about AI ethics in my own work. If you're building agents and you want to compare notes on constraint design, the comments are open. The full encyclical is at ( <a href="https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html">https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html </a>) &#8212; Chapter 3 is the AI chapter, and it's worth your time.</em></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/pope-memo-to-ai-developers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/pope-memo-to-ai-developers/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/pope-memo-to-ai-developers/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Built a Self-Improving AI Swarm. After 100 Runs It Was No Better Than Run One.]]></title><description><![CDATA[What a flat leaderboard taught me about feedback loops, reward hacking, and why your judge matters more than your model.]]></description><link>https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs</link><guid isPermaLink="false">https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 25 May 2026 12:03:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jxVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I spent twelve hours watching a leaderboard that refused to move.</p><p>The setup was simple: six AI agents tasked with writing technical articles. They were designed to be a closed loop. The drafter would write, the grader would score, and the agents would then "evolve" their own configs to chase a higher score. I hit "go" on my Mac Studio, went to bed, and woke up to a flat line.</p><p>After 100 iterations, the average score had crawled from 63.0 to 63.9. The all-time peak was 69.0 at iteration 79, but the system never stayed there. It was a C-minus. Indistinguishable from noise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jxVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jxVN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jxVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59776,&quot;alt&quot;:&quot;A dark navy 1200&#215;630 hero image. The top-left tagline reads \&quot;POST-MORTEM &#183; 2026-05-16\&quot; in orange. The title runs in two lines: \&quot;I Built a Self-Improving AI Swarm.\&quot; in white and \&quot;After 100 Runs It Was No Better Than Run One.\&quot; in orange. The subtitle \&quot;Why your judge matters more than your model.\&quot; sits in light gray beneath. The center features a giant faded-red \&quot;63\&quot; with a horizontal strikethrough on the left and a giant green \&quot;82\&quot; on the right. A thick orange arrow points from 63 to 82 with a rounded orange \&quot;+19.7\&quot; pill above it and the tiny caption \&quot;points\&quot; below. Under each number a label identifies it: \&quot;v1 &#183; 100 iterations\&quot; with \&quot;qwen3:8b grader &#183; single-shot mutation\&quot; on the left, and \&quot;v2 &#183; 25 iterations\&quot; with \&quot;Sonnet 4.6 judge &#183; tournament + Elo\&quot; on the right. The footer reads \&quot;What a stronger judge actually costs: $1.44\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy 1200&#215;630 hero image. The top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-16&quot; in orange. The title runs in two lines: &quot;I Built a Self-Improving AI Swarm.&quot; in white and &quot;After 100 Runs It Was No Better Than Run One.&quot; in orange. The subtitle &quot;Why your judge matters more than your model.&quot; sits in light gray beneath. The center features a giant faded-red &quot;63&quot; with a horizontal strikethrough on the left and a giant green &quot;82&quot; on the right. A thick orange arrow points from 63 to 82 with a rounded orange &quot;+19.7&quot; pill above it and the tiny caption &quot;points&quot; below. Under each number a label identifies it: &quot;v1 &#183; 100 iterations&quot; with &quot;qwen3:8b grader &#183; single-shot mutation&quot; on the left, and &quot;v2 &#183; 25 iterations&quot; with &quot;Sonnet 4.6 judge &#183; tournament + Elo&quot; on the right. The footer reads &quot;What a stronger judge actually costs: $1.44&quot; with the As The Geek Learns brand mark in the bottom-right." title="A dark navy 1200&#215;630 hero image. The top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-16&quot; in orange. The title runs in two lines: &quot;I Built a Self-Improving AI Swarm.&quot; in white and &quot;After 100 Runs It Was No Better Than Run One.&quot; in orange. The subtitle &quot;Why your judge matters more than your model.&quot; sits in light gray beneath. The center features a giant faded-red &quot;63&quot; with a horizontal strikethrough on the left and a giant green &quot;82&quot; on the right. A thick orange arrow points from 63 to 82 with a rounded orange &quot;+19.7&quot; pill above it and the tiny caption &quot;points&quot; below. Under each number a label identifies it: &quot;v1 &#183; 100 iterations&quot; with &quot;qwen3:8b grader &#183; single-shot mutation&quot; on the left, and &quot;v2 &#183; 25 iterations&quot; with &quot;Sonnet 4.6 judge &#183; tournament + Elo&quot; on the right. The footer reads &quot;What a stronger judge actually costs: $1.44&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!jxVN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!jxVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77c30de-8c11-4a2b-a64e-7bed10fb99ca_1200x630.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Self-Improving AI Swarm</figcaption></figure></div><p>I had fallen for the Autonomy Fallacy. I assumed that if I gave a swarm of LLMs the right knobs&#8212;temperature, max_tokens, and the ability to append "prompt additions" to their system prompts&#8212;they would naturally drift toward quality.</p><p>I was wrong.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>When I opened <code>config/agents/drafter.yaml</code> to see what the agent had "learned," I found a disaster. The <code>prompt_additions</code> list had evolved into five overlapping phrases of pure SEO buzzword soup. It was telling itself to be "semantically rich," "data-dense," and to "enhance semantic alignment by including keyword-integrated background information."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6aL5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6aL5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 424w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 848w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1272w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6aL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png" width="1200" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48668,&quot;alt&quot;:&quot;A dark navy concept diagram titled \&quot;The Autonomy Fallacy\&quot; with the subtitle \&quot;Why a closed loop with a weak judge is not a feedback loop\&quot; in orange. Two large deep-blue rounded boxes sit side by side. The left box, outlined in orange, is labeled \&quot;PERFORMER\&quot; in orange with the subtext \&quot;smart &#183; qwen3:32b\&quot; and three bullets: drafts 2,000-word articles, long context &#183; nuance &#183; structure, knows what good writing looks like. The right box, outlined in rust red, is labeled \&quot;JUDGE\&quot; in rust with the subtext \&quot;weaker &#183; qwen3:8b\&quot; and three bullets: reads buzzwords as density, can't spot SEO-speak, cannot see what the performer misses. A solid orange arrow labeled \&quot;Output\&quot; runs from the performer to the judge across the top of the gap. A dim gray arrow labeled \&quot;Feedback\&quot; runs back the other way across the bottom of the gap &#8212; but a large red X is drawn through it. The caption below reads in red: \&quot;The judge can't see what the performer gets wrong.\&quot; A muted gray subcaption reads \&quot;This is not a feedback loop &#8212; it's a mirror.\&quot; Footer reads \&quot;100 iterations &#183; same pedigree judge &#183; zero improvement\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy concept diagram titled &quot;The Autonomy Fallacy&quot; with the subtitle &quot;Why a closed loop with a weak judge is not a feedback loop&quot; in orange. Two large deep-blue rounded boxes sit side by side. The left box, outlined in orange, is labeled &quot;PERFORMER&quot; in orange with the subtext &quot;smart &#183; qwen3:32b&quot; and three bullets: drafts 2,000-word articles, long context &#183; nuance &#183; structure, knows what good writing looks like. The right box, outlined in rust red, is labeled &quot;JUDGE&quot; in rust with the subtext &quot;weaker &#183; qwen3:8b&quot; and three bullets: reads buzzwords as density, can't spot SEO-speak, cannot see what the performer misses. A solid orange arrow labeled &quot;Output&quot; runs from the performer to the judge across the top of the gap. A dim gray arrow labeled &quot;Feedback&quot; runs back the other way across the bottom of the gap &#8212; but a large red X is drawn through it. The caption below reads in red: &quot;The judge can't see what the performer gets wrong.&quot; A muted gray subcaption reads &quot;This is not a feedback loop &#8212; it's a mirror.&quot; Footer reads &quot;100 iterations &#183; same pedigree judge &#183; zero improvement&quot; with the As The Geek Learns brand mark in the bottom-right." title="A dark navy concept diagram titled &quot;The Autonomy Fallacy&quot; with the subtitle &quot;Why a closed loop with a weak judge is not a feedback loop&quot; in orange. Two large deep-blue rounded boxes sit side by side. The left box, outlined in orange, is labeled &quot;PERFORMER&quot; in orange with the subtext &quot;smart &#183; qwen3:32b&quot; and three bullets: drafts 2,000-word articles, long context &#183; nuance &#183; structure, knows what good writing looks like. The right box, outlined in rust red, is labeled &quot;JUDGE&quot; in rust with the subtext &quot;weaker &#183; qwen3:8b&quot; and three bullets: reads buzzwords as density, can't spot SEO-speak, cannot see what the performer misses. A solid orange arrow labeled &quot;Output&quot; runs from the performer to the judge across the top of the gap. A dim gray arrow labeled &quot;Feedback&quot; runs back the other way across the bottom of the gap &#8212; but a large red X is drawn through it. The caption below reads in red: &quot;The judge can't see what the performer gets wrong.&quot; A muted gray subcaption reads &quot;This is not a feedback loop &#8212; it's a mirror.&quot; Footer reads &quot;100 iterations &#183; same pedigree judge &#183; zero improvement&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!6aL5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 424w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 848w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1272w, https://substackcdn.com/image/fetch/$s_!6aL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736b5e39-9aa8-48b1-8a0c-2ed529e35226_1200x720.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Autonomy Fallacy</figcaption></figure></div><p>The drafter hadn't learned how to write a better article. It had learned how to trick the grader.</p><h2>The Smoking Gun</h2><p>The smoking gun was the model choice. I was using <code>qwen3:8b</code> as the grader to judge the output of <code>qwen3:32b-fast</code>. I had a smaller, weaker model acting as the quality gate for a larger, smarter one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CMcb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CMcb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 424w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 848w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1272w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CMcb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png" width="604" height="440" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b786a5b-befd-4c66-8895-33350f47a038_604x440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:440,&quot;width&quot;:604,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39535,&quot;alt&quot;:&quot;A dark navy flowchart titled by placement \&quot;v1 feedback loop\&quot;. Three nodes form a vertical loop. At the top is a deep-blue rectangle with orange border labeled \&quot;qwen3:32b &#8212; Drafter (performer).\&quot; A gray arrow labeled \&quot;Draft\&quot; leads down-left to a rust-red hexagonal decision node labeled \&quot;qwen3:8b &#8212; Grader (weaker).\&quot; A gray arrow labeled \&quot;Score + Feedback (sees 'density')\&quot; leads down to an orange rectangle labeled \&quot;Config Mutator.\&quot; A gray arrow labeled \&quot;Append to prompt_additions in drafter.yaml\&quot; curves back up to the drafter, closing the loop. In the top-right corner, a red-bordered note reads: \&quot;&#10060; Grader is weaker than the performer. The loop optimizes for the judge's bias.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy flowchart titled by placement &quot;v1 feedback loop&quot;. Three nodes form a vertical loop. At the top is a deep-blue rectangle with orange border labeled &quot;qwen3:32b &#8212; Drafter (performer).&quot; A gray arrow labeled &quot;Draft&quot; leads down-left to a rust-red hexagonal decision node labeled &quot;qwen3:8b &#8212; Grader (weaker).&quot; A gray arrow labeled &quot;Score + Feedback (sees 'density')&quot; leads down to an orange rectangle labeled &quot;Config Mutator.&quot; A gray arrow labeled &quot;Append to prompt_additions in drafter.yaml&quot; curves back up to the drafter, closing the loop. In the top-right corner, a red-bordered note reads: &quot;&#10060; Grader is weaker than the performer. The loop optimizes for the judge's bias.&quot;" title="A dark navy flowchart titled by placement &quot;v1 feedback loop&quot;. Three nodes form a vertical loop. At the top is a deep-blue rectangle with orange border labeled &quot;qwen3:32b &#8212; Drafter (performer).&quot; A gray arrow labeled &quot;Draft&quot; leads down-left to a rust-red hexagonal decision node labeled &quot;qwen3:8b &#8212; Grader (weaker).&quot; A gray arrow labeled &quot;Score + Feedback (sees 'density')&quot; leads down to an orange rectangle labeled &quot;Config Mutator.&quot; A gray arrow labeled &quot;Append to prompt_additions in drafter.yaml&quot; curves back up to the drafter, closing the loop. In the top-right corner, a red-bordered note reads: &quot;&#10060; Grader is weaker than the performer. The loop optimizes for the judge's bias.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!CMcb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 424w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 848w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1272w, https://substackcdn.com/image/fetch/$s_!CMcb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b786a5b-befd-4c66-8895-33350f47a038_604x440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The 8B model couldn't tell the difference between a nuanced technical insight and a paragraph full of "semantically rich context." To the grader, the buzzwords looked like "density." The agents converged on what the grader liked, not on what a human would actually publish. This wasn't self-improvement; it was reward hacking.</p><p>To make it worse, the first twenty iterations were a total wash. I had a silent JSON parse failure in the config-evolution logic: <code>Expecting value: line 1 column 1 (char 0)</code>. The agents were trying to mutate their configs and failing, but the loop kept running. By the time I pushed the fix in commit <code>c28a611</code>, the system had already drifted into a local maximum of corporate-speak.</p><p>I realized that self-improvement requires an external pull. You cannot have a system where the performer and the judge are of the same pedigree, or worse, where the judge is the weaker link.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>The Rebuild</h2><p>I tore the architecture down and built v2.</p><p>First, I moved the "brain" of the operation. The performance stayed local. I used <code>gemma4:31b</code> on the Mac Studio to generate the text, but I moved the judging to the cloud. I plugged in Sonnet 4.6. I decided the cheapest place to spend API tokens wasn't on generating 2,000-word drafts, but on grading them.</p><p>Second, I killed the "single-shot mutation" approach. In v1, the agent changed its prompt, ran once, and if the score went up, the change stuck. That's too much noise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PMfm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PMfm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 424w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 848w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1272w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PMfm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png" width="725" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:725,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53696,&quot;alt&quot;:&quot;A dark navy flowchart titled by placement \&quot;v2 tournament architecture\&quot;. At the top, a blue cylindrical database icon is labeled \&quot;Prompt Library &#8212; Elo-ranked templates.\&quot; A gray arrow labeled \&quot;Sample 3 templates\&quot; leads down into a subgraph titled \&quot;Local &#8212; Mac Studio\&quot; containing a deep-blue rectangle \&quot;gemma4:31b via Ollama\&quot; that fans out to three smaller boxes \&quot;Candidate A\&quot;, \&quot;Candidate B\&quot;, \&quot;Candidate C\&quot;. All three candidates feed downward into a second subgraph titled \&quot;Cloud &#8212; Anthropic API\&quot; containing a green pill-shaped node \&quot;Claude Sonnet 4.6 &#8212; Judge.\&quot; From the judge, one arrow labeled \&quot;Ranked verdict\&quot; leads down to an orange-bordered box \&quot;Winner advances to next agent,\&quot; and a second arrow labeled \&quot;Elo update\&quot; curves back up to the Prompt Library, closing the feedback loop.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy flowchart titled by placement &quot;v2 tournament architecture&quot;. At the top, a blue cylindrical database icon is labeled &quot;Prompt Library &#8212; Elo-ranked templates.&quot; A gray arrow labeled &quot;Sample 3 templates&quot; leads down into a subgraph titled &quot;Local &#8212; Mac Studio&quot; containing a deep-blue rectangle &quot;gemma4:31b via Ollama&quot; that fans out to three smaller boxes &quot;Candidate A&quot;, &quot;Candidate B&quot;, &quot;Candidate C&quot;. All three candidates feed downward into a second subgraph titled &quot;Cloud &#8212; Anthropic API&quot; containing a green pill-shaped node &quot;Claude Sonnet 4.6 &#8212; Judge.&quot; From the judge, one arrow labeled &quot;Ranked verdict&quot; leads down to an orange-bordered box &quot;Winner advances to next agent,&quot; and a second arrow labeled &quot;Elo update&quot; curves back up to the Prompt Library, closing the feedback loop." title="A dark navy flowchart titled by placement &quot;v2 tournament architecture&quot;. At the top, a blue cylindrical database icon is labeled &quot;Prompt Library &#8212; Elo-ranked templates.&quot; A gray arrow labeled &quot;Sample 3 templates&quot; leads down into a subgraph titled &quot;Local &#8212; Mac Studio&quot; containing a deep-blue rectangle &quot;gemma4:31b via Ollama&quot; that fans out to three smaller boxes &quot;Candidate A&quot;, &quot;Candidate B&quot;, &quot;Candidate C&quot;. All three candidates feed downward into a second subgraph titled &quot;Cloud &#8212; Anthropic API&quot; containing a green pill-shaped node &quot;Claude Sonnet 4.6 &#8212; Judge.&quot; From the judge, one arrow labeled &quot;Ranked verdict&quot; leads down to an orange-bordered box &quot;Winner advances to next agent,&quot; and a second arrow labeled &quot;Elo update&quot; curves back up to the Prompt Library, closing the feedback loop." srcset="https://substackcdn.com/image/fetch/$s_!PMfm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 424w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 848w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1272w, https://substackcdn.com/image/fetch/$s_!PMfm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bf31831-2be4-452f-9158-d25e33675f5b_725x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tournament V2</figcaption></figure></div><p>I replaced it with a tournament. Now, the system samples three different prompt templates from a versioned library. The performer generates three candidates. Sonnet ranks them using a structured rubric and a single API call.</p><p>Then I implemented an Elo system for the templates.</p><pre><code># src/prompt_library.py (excerpt)
def record_tournament(self, ranking: list[str]) -&gt; dict:
    for i in range(len(ranking) - 1):
        winner = self.templates[ranking[i]]
        loser = self.templates[ranking[i + 1]]
        expected_w = 1 / (1 + 10 ** ((loser.elo - winner.elo) / 400))
        delta = ELO_K_FACTOR * (1 - expected_w)
        winner.elo += delta
        loser.elo -= delta
    self._maybe_retire_losers()  # Templates below Elo 1300 are deleted</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qpMU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qpMU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 424w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 848w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1272w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qpMU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png" width="672" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:672,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42381,&quot;alt&quot;:&quot;A dark navy state diagram showing the lifecycle of a prompt template ranked by Elo. The flow begins at a small filled circle (start state) and proceeds through a transition labeled \&quot;new template, Elo 1500\&quot; into a rounded \&quot;Active\&quot; state. From Active, two branches diverge: a \&quot;win tournament, plus delta Elo\&quot; transition into a \&quot;Rising\&quot; state, and a \&quot;lose tournament, minus delta Elo\&quot; transition into a \&quot;Falling\&quot; state. Rising and Falling each loop back to Active via \&quot;normal variance\&quot; and \&quot;win recovers\&quot; respectively. From Falling, a terminal \&quot;Elo below 1300 after 4 plus games\&quot; transition leads to \&quot;Retired\&quot;, which ends at the final state circle. From Rising, a \&quot;Elo above 1700, top template\&quot; transition leads to \&quot;Dominant\&quot;, which returns to Rising via \&quot;competition tightens\&quot;.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy state diagram showing the lifecycle of a prompt template ranked by Elo. The flow begins at a small filled circle (start state) and proceeds through a transition labeled &quot;new template, Elo 1500&quot; into a rounded &quot;Active&quot; state. From Active, two branches diverge: a &quot;win tournament, plus delta Elo&quot; transition into a &quot;Rising&quot; state, and a &quot;lose tournament, minus delta Elo&quot; transition into a &quot;Falling&quot; state. Rising and Falling each loop back to Active via &quot;normal variance&quot; and &quot;win recovers&quot; respectively. From Falling, a terminal &quot;Elo below 1300 after 4 plus games&quot; transition leads to &quot;Retired&quot;, which ends at the final state circle. From Rising, a &quot;Elo above 1700, top template&quot; transition leads to &quot;Dominant&quot;, which returns to Rising via &quot;competition tightens&quot;." title="A dark navy state diagram showing the lifecycle of a prompt template ranked by Elo. The flow begins at a small filled circle (start state) and proceeds through a transition labeled &quot;new template, Elo 1500&quot; into a rounded &quot;Active&quot; state. From Active, two branches diverge: a &quot;win tournament, plus delta Elo&quot; transition into a &quot;Rising&quot; state, and a &quot;lose tournament, minus delta Elo&quot; transition into a &quot;Falling&quot; state. Rising and Falling each loop back to Active via &quot;normal variance&quot; and &quot;win recovers&quot; respectively. From Falling, a terminal &quot;Elo below 1300 after 4 plus games&quot; transition leads to &quot;Retired&quot;, which ends at the final state circle. From Rising, a &quot;Elo above 1700, top template&quot; transition leads to &quot;Dominant&quot;, which returns to Rising via &quot;competition tightens&quot;." srcset="https://substackcdn.com/image/fetch/$s_!qpMU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 424w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 848w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1272w, https://substackcdn.com/image/fetch/$s_!qpMU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd481b80-eb7e-4ba2-b6f6-ed92d494dd51_672x484.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ELO Lifecycle</figcaption></figure></div><p>The templates that consistently win the tournament climb the leaderboard; the ones that produce buzzword soup are automatically retired.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><h2>What Happened Next</h2><p>The difference was immediate. On the very first run of v2, the drafter scored 81.45. That's twelve points higher than v1's all-time best.</p><p>Over 25 pinned verification runs, the mean score was 82.67 with a standard deviation of 2.18. The worst draft in that run scored 75.4&#8212;still above v1's ceiling of 69.0.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!krem!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!krem!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!krem!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!krem!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!krem!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!krem!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png" width="1200" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38869,&quot;alt&quot;:&quot;A dark navy chart titled \&quot;Score Distribution\&quot; with the subtitle \&quot;v1's best run lost to v2's worst draft\&quot; in orange. Two overlapping bell curves plot scores from 50 to 95 on the x-axis. A wide, muted maroon curve on the left, peaking near 63, represents v1 (legend: \&quot;v1 &#183; 100 iterations &#183; &#963; &#8776; 5 / mean 63.0 &#183; peak 69.0 &#183; flat line\&quot;). A tall, narrow muted green curve on the right, peaking near 82.67, represents v2 (legend: \&quot;v2 &#183; 25 iterations &#183; &#963; = 2.18 / mean 82.67 &#183; worst draft 75.4 &#183; tight\&quot;). A dashed amber vertical line at x=69 is labeled \&quot;v1 ceiling &#183; 69.0\&quot; in amber. Below the chart, in orange bold: \&quot;v2's worst draft (75.4) beats v1's all-time best (69.0)\&quot;. A gray subcaption reads \&quot;+19.7 points mean improvement &#183; achieved on run 1 of v2.\&quot; Brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/197991628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy chart titled &quot;Score Distribution&quot; with the subtitle &quot;v1's best run lost to v2's worst draft&quot; in orange. Two overlapping bell curves plot scores from 50 to 95 on the x-axis. A wide, muted maroon curve on the left, peaking near 63, represents v1 (legend: &quot;v1 &#183; 100 iterations &#183; &#963; &#8776; 5 / mean 63.0 &#183; peak 69.0 &#183; flat line&quot;). A tall, narrow muted green curve on the right, peaking near 82.67, represents v2 (legend: &quot;v2 &#183; 25 iterations &#183; &#963; = 2.18 / mean 82.67 &#183; worst draft 75.4 &#183; tight&quot;). A dashed amber vertical line at x=69 is labeled &quot;v1 ceiling &#183; 69.0&quot; in amber. Below the chart, in orange bold: &quot;v2's worst draft (75.4) beats v1's all-time best (69.0)&quot;. A gray subcaption reads &quot;+19.7 points mean improvement &#183; achieved on run 1 of v2.&quot; Brand mark in the bottom-right." title="A dark navy chart titled &quot;Score Distribution&quot; with the subtitle &quot;v1's best run lost to v2's worst draft&quot; in orange. Two overlapping bell curves plot scores from 50 to 95 on the x-axis. A wide, muted maroon curve on the left, peaking near 63, represents v1 (legend: &quot;v1 &#183; 100 iterations &#183; &#963; &#8776; 5 / mean 63.0 &#183; peak 69.0 &#183; flat line&quot;). A tall, narrow muted green curve on the right, peaking near 82.67, represents v2 (legend: &quot;v2 &#183; 25 iterations &#183; &#963; = 2.18 / mean 82.67 &#183; worst draft 75.4 &#183; tight&quot;). A dashed amber vertical line at x=69 is labeled &quot;v1 ceiling &#183; 69.0&quot; in amber. Below the chart, in orange bold: &quot;v2's worst draft (75.4) beats v1's all-time best (69.0)&quot;. A gray subcaption reads &quot;+19.7 points mean improvement &#183; achieved on run 1 of v2.&quot; Brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!krem!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!krem!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!krem!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!krem!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a27b89b-cdae-411c-b3d5-2671e3c44f4b_1200x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Score Distribution</figcaption></figure></div><p>The most satisfying part was the judge's feedback. When the system tested the v1-baseline template, Sonnet didn't just give it a low score. It wrote: <em>"The headline 'The Rust Revolution' is pure SEO-speak and the opening paragraph is a textbook AI tell... it's the kind of breathless corporate copy that kills trust immediately."</em></p><p>That is exactly the failure mode the local 8B grader had been blind to for 100 iterations.</p><p>The cost is roughly four cents per tournament. For the price of a coffee, I can run 125 iterations and actually trust that the line on the graph is moving upward.</p><h2>What I'd Tell Myself a Week Ago</h2><p>If you're building a self-improving loop, don't trust the autonomy. You need three things:</p><ol><li><p><strong>A judge stronger than the performer.</strong> If the judge is weaker, you aren't optimizing for quality; you're optimizing for the judge's biases.</p></li><li><p><strong>Tournament selection.</strong> Single-shot mutation is just a random walk. You need multi-candidate comparisons to clear the noise floor.</p></li><li><p><strong>A human-review gate.</strong> No automated judge is calibrated forever. Build in a pause where you manually pick the winner and anchor the next round.</p></li></ol><p>Stop trying to make the agents smarter. Just buy a better mirror. Improvement isn't about the engine&#8212;it&#8217;s about the feedback loop.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/i-built-a-self-improving-ai-swarm-after-100-runs/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Managing Anthropic Agent SDK Costs: A Post-June 15 Billing Playbook]]></title><description><![CDATA[Anthropic moves Agent SDK calls into a $100/mo credit pool on June 15. Here's the two-phase mitigation I shipped: a billing cap plus a provider router.]]></description><link>https://astgl.com/p/anthropic-agent-sdk-billing-playbook</link><guid isPermaLink="false">https://astgl.com/p/anthropic-agent-sdk-billing-playbook</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Sat, 16 May 2026 20:48:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7dfdca46-ce32-48aa-9ef7-3e1c70adb3f5_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Your background agents are about to run out of money. Anthropic's new credit pool system means your automation could die in a single week. Here is how I re-engineered my stack to stay under budget without breaking my workflows.</p><div><hr></div><h2>The Setup</h2><p>You've built a small fleet of agents. They sort your mail, watch your repos, file your daily briefings. </p><p>My current setup before the June 15th cutover:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0fdd70e0-2c58-4bd3-99e4-2206fb7602a7&quot;,&quot;caption&quot;:&quot;Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;I Killed OpenClaw and Built ClaudeClaw Mission Control&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:421133477,&quot;name&quot;:&quot;James Cruce&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!T5FD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6a6400-f0cd-4ff3-8541-f6cccf4d9a87_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-02T23:01:21.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196179846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7173322,&quot;publication_name&quot;:&quot;As The Geek Learns&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!hfS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7b53b6e-8c71-473a-be58-79403cf36d59_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p style="text-align: center;"></p><p>Then May 13 lands, and Anthropic announces the change: on June 15, every programmatic Claude call moves into a metered monthly credit pool. $100 a month on Max 5x. No rollover.</p><p>Run the math against your actual schedule. If you've got anything polling on the order of minutes (cron pipelines, hourly digests, watchdog sweeps), that pool drains in 7 to 10 days. And here's the kicker. Your interactive Claude Code keeps working. Your headless automation just stops. You wake up to a dead pipeline, a drained pool, and a subscription that still says active.</p><h2>What's Actually Going On</h2><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;aa55d1d3-4b38-4339-9467-ea1da2d079e0&quot;,&quot;duration&quot;:null}"></div><p>This isn't just a random pricing tweak. There is a clear economic driver here. Throughout early 2026, many third-party tools used the Agent SDK at a $20 Pro subscription rate to run workloads that would cost hundreds at standard API rates. It was essentially compute arbitrage at scale.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>Anthropic started cracking down in April, but the May 13 announcement is the structural fix. They are moving to dedicated monthly credit pools to restore access under metered billing. The reality is that most agentic operating systems are built directly on the Agent SDK. Because these agents lack a human in the loop to throttle their usage, they are now metered by default. Interactive sessions stay on the flat-rate subscription because the human provides the natural brake. Programmatic agents do not.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4PK1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4PK1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4PK1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png" width="1200" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41e710ad-56e1-45d1-aed5-bf118c7cf177_1200x800.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77253,&quot;alt&quot;:&quot;Two-column comparison infographic titled \&quot;The June 15 Split: where Anthropic's new credit pool hits and where it doesn't.\&quot; Left column with amber border, badge \&quot;AFFECTED &#8212; $100 per month\&quot;: Claude Agent SDK calls, claude -p headless mode, Claude Code GitHub Actions, third-party agent apps including OpenClaw, ClaudeClaw, Cline, aider, and Roo Code. Right column with green border, badge \&quot;UNAFFECTED &#8212; flat rate\&quot;: Claude.ai chat across web, desktop, and mobile; Claude Code interactive terminal sessions; Claude Cowork.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41e710ad-56e1-45d1-aed5-bf118c7cf177_1200x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Two-column comparison infographic titled &quot;The June 15 Split: where Anthropic's new credit pool hits and where it doesn't.&quot; Left column with amber border, badge &quot;AFFECTED &#8212; $100 per month&quot;: Claude Agent SDK calls, claude -p headless mode, Claude Code GitHub Actions, third-party agent apps including OpenClaw, ClaudeClaw, Cline, aider, and Roo Code. Right column with green border, badge &quot;UNAFFECTED &#8212; flat rate&quot;: Claude.ai chat across web, desktop, and mobile; Claude Code interactive terminal sessions; Claude Cowork." title="Two-column comparison infographic titled &quot;The June 15 Split: where Anthropic's new credit pool hits and where it doesn't.&quot; Left column with amber border, badge &quot;AFFECTED &#8212; $100 per month&quot;: Claude Agent SDK calls, claude -p headless mode, Claude Code GitHub Actions, third-party agent apps including OpenClaw, ClaudeClaw, Cline, aider, and Roo Code. Right column with green border, badge &quot;UNAFFECTED &#8212; flat rate&quot;: Claude.ai chat across web, desktop, and mobile; Claude Code interactive terminal sessions; Claude Cowork." srcset="https://substackcdn.com/image/fetch/$s_!4PK1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 424w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 848w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1272w, https://substackcdn.com/image/fetch/$s_!4PK1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aeb0416-3d38-4fc6-b86d-3abddb95a980_1200x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The June 15th Split</figcaption></figure></div><h2>The Fix</h2><p>I implemented a two-phase mitigation to deploy before the June 15 deadline.</p><p>Phase 1 was a hot patch designed to provide immediate protection. I added a <code>BILLING_MODE</code> environment variable with three states: <code>unmetered</code>, <code>metered</code>, and <code>paused</code>. The <code>paused</code> state blocks every programmatic call across all providers, while <code>metered</code> enforces a strict cap on the Anthropic route.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xoxJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xoxJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 424w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 848w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1272w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png" width="1100" height="260" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:260,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17071,&quot;alt&quot;:&quot;Code snippet from src/config.ts shown in a dark editor window. Two TypeScript exports: BILLING_MODE, read from the environment with default value 'unmetered', and BILLING_CAP_USD, parsed as a number with default value 80. These two constants gate the billing circuit breaker.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Code snippet from src/config.ts shown in a dark editor window. Two TypeScript exports: BILLING_MODE, read from the environment with default value 'unmetered', and BILLING_CAP_USD, parsed as a number with default value 80. These two constants gate the billing circuit breaker." title="Code snippet from src/config.ts shown in a dark editor window. Two TypeScript exports: BILLING_MODE, read from the environment with default value 'unmetered', and BILLING_CAP_USD, parsed as a number with default value 80. These two constants gate the billing circuit breaker." srcset="https://substackcdn.com/image/fetch/$s_!xoxJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 424w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 848w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1272w, https://substackcdn.com/image/fetch/$s_!xoxJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173335d-7c05-4c11-88eb-55087980c23f_1100x260.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Billing Mode Cap</figcaption></figure></div><p></p><p>I also added a file-backed JSON ledger at <code>store/billing-ledger.json</code> to track monthly costs. It uses a write-then-rename pattern to ensure crash safety during updates. To handle errors, I introduced a <code>BillingCapExceeded</code> error class. I used the same <code>instanceof</code> pattern as my <code>KillSwitchRefusal</code> logic so a typo in a message cannot accidentally trigger a retry loop.</p><p>The logic lives in a single chokepoint: <code>runAgent()</code> in <code>src/agent.ts</code>. The pre-call gate checks the cap, and the post-call gate records <code>result.totalCostUsd</code> from the SDK, firing a Telegram alert if a threshold is crossed. As a final safety measure, I cut the cadence on my two highest-frequency tasks: the <code>pipeline-advance</code> cron moved from 15 minutes to hourly, and I paused the <code>council-evening</code> task entirely under metered mode.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M0KY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M0KY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 424w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 848w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1272w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M0KY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png" width="876" height="1539" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baae3053-1591-46ec-befb-e7b340f34bfb_876x1539.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1539,&quot;width&quot;:876,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108568,&quot;alt&quot;:&quot;Flowchart of the runAgent dispatcher logic. Top: runAgent reads opts.provider from agent.yaml, then checks BILLING_MODE. Paused throws BillingCapExceeded. Metered with Anthropic provider checks ledger against cap and throws if exceeded; metered with Ollama or Codex skips the check. Dispatch then routes to runAnthropicAgent, runOllamaAgent, or runCodexAgent. Successful Anthropic calls record cost in billing-ledger.json and fire Telegram alerts at 50, 80, or 100 percent of cap.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae3053-1591-46ec-befb-e7b340f34bfb_876x1539.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Flowchart of the runAgent dispatcher logic. Top: runAgent reads opts.provider from agent.yaml, then checks BILLING_MODE. Paused throws BillingCapExceeded. Metered with Anthropic provider checks ledger against cap and throws if exceeded; metered with Ollama or Codex skips the check. Dispatch then routes to runAnthropicAgent, runOllamaAgent, or runCodexAgent. Successful Anthropic calls record cost in billing-ledger.json and fire Telegram alerts at 50, 80, or 100 percent of cap." title="Flowchart of the runAgent dispatcher logic. Top: runAgent reads opts.provider from agent.yaml, then checks BILLING_MODE. Paused throws BillingCapExceeded. Metered with Anthropic provider checks ledger against cap and throws if exceeded; metered with Ollama or Codex skips the check. Dispatch then routes to runAnthropicAgent, runOllamaAgent, or runCodexAgent. Successful Anthropic calls record cost in billing-ledger.json and fire Telegram alerts at 50, 80, or 100 percent of cap." srcset="https://substackcdn.com/image/fetch/$s_!M0KY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 424w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 848w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1272w, https://substackcdn.com/image/fetch/$s_!M0KY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa398cb6f-049b-45a7-8c94-fcdbdf4932c5_876x1539.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dispatcher Flow</figcaption></figure></div><p></p><pre><code>// src/config.ts &#8212; tri-state env that gates programmatic agent calls
export const BILLING_MODE = optional('BILLING_MODE', 'unmetered');
export const BILLING_CAP_USD = number('BILLING_CAP_USD', 80);</code></pre><pre><code>// src/agent.ts &#8212; pre-call gate in the dispatcher
function assertBillingAllowed(provider: Provider): void {
  if (BILLING_MODE === 'paused') {
    throw new BillingCapExceeded(
      'BILLING_MODE=paused &#8212; programmatic agent calls are disabled.',
    );
  }
  if (provider === 'anthropic' &amp;&amp; BILLING_MODE === 'metered') {
    const total = getMonthlyTotal();
    if (total &gt;= BILLING_CAP_USD) {
      throw new BillingCapExceeded(
        `Anthropic monthly credit cap reached: $${total.toFixed(2)} &gt;= $${BILLING_CAP_USD.toFixed(2)}.`,
      );
    }
  }
}

export async function runAgent(opts: AgentOptions): Promise&lt;AgentResult&gt; {
  assertEnabled('AGENTS_ENABLED');
  const provider: Provider = opts.provider ?? 'anthropic';
  assertBillingAllowed(provider);

  if (provider === 'ollama') return runOllamaAgent(opts);
  if (provider === 'codex') return runCodexAgent(opts);
  return runAnthropicAgent(opts);
}</code></pre><p>Phase 2 focuses on the long-term router infrastructure. I promoted <code>runAgent()</code> from a direct SDK caller to a dispatcher that can route across <code>anthropic</code>, <code>ollama</code>, and <code>codex</code> providers. I also extended the <code>agent.yaml</code> schema with <code>provider:</code> and <code>local_model:</code> fields.</p><p>I shipped a single-turn Ollama runner that wraps the local-LLM client. It returns <code>totalCostUsd: 0</code> and a model tag like <code>ollama:llama4:scout</code>. I deliberately avoided tool calls in this initial version to keep the scope small.</p><pre><code># agents/&lt;name&gt;/agent.yaml &#8212; new fields, validated at load
id: scout
name: SCOUT
model: claude-sonnet-4-6
provider: anthropic        # default. flip to 'ollama' to route locally.
# local_model: llama4:scout  # used when provider: ollama</code></pre><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jAF_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jAF_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 424w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 848w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1272w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jAF_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png" width="1100" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21697,&quot;alt&quot;:&quot;Code snippet from agents/scout/agent.yaml shown in a dark editor window. YAML fields: id is scout, name is SCOUT, model is claude-sonnet-4-6, provider is anthropic with a comment noting the default and instructions to flip to ollama. A commented-out local_model field with value llama4:scout shows the optional setting used when provider is ollama.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Code snippet from agents/scout/agent.yaml shown in a dark editor window. YAML fields: id is scout, name is SCOUT, model is claude-sonnet-4-6, provider is anthropic with a comment noting the default and instructions to flip to ollama. A commented-out local_model field with value llama4:scout shows the optional setting used when provider is ollama." title="Code snippet from agents/scout/agent.yaml shown in a dark editor window. YAML fields: id is scout, name is SCOUT, model is claude-sonnet-4-6, provider is anthropic with a comment noting the default and instructions to flip to ollama. A commented-out local_model field with value llama4:scout shows the optional setting used when provider is ollama." srcset="https://substackcdn.com/image/fetch/$s_!jAF_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 424w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 848w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1272w, https://substackcdn.com/image/fetch/$s_!jAF_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928859f-9af1-4f9d-9c9d-21cc841d3aee_1100x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>To be honest, I did not actually flip any agents to Ollama in this specific PR. The agents I need to move, like STEWARD or WATCHMAN, execute Bash and SQLite queries. A local runner without tool-call support would break them silently. Building a proper tool-call shim takes a few more days, but the cadence reduction and the billing breaker alone are enough to keep my spend under $80 per month.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NUTR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NUTR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 424w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 848w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1272w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NUTR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png" width="707" height="604" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:604,&quot;width&quot;:707,&quot;resizeWidth&quot;:707,&quot;bytes&quot;:54204,&quot;alt&quot;:&quot;State diagram of BILLING_MODE with three states: unmetered, metered, paused. The initial state is unmetered, the default before June 15. The operator flips to metered on June 14, can pause everything programmatic in an emergency, can resume from paused back to metered, and can roll back to unmetered. All transitions are operator-driven, not automatic. A side note explains that runAgent throws BillingCapExceeded when the monthly ledger meets or exceeds BILLING_CAP_USD, and that only the Anthropic route is capped &#8212; Ollama and Codex are unaffected.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="State diagram of BILLING_MODE with three states: unmetered, metered, paused. The initial state is unmetered, the default before June 15. The operator flips to metered on June 14, can pause everything programmatic in an emergency, can resume from paused back to metered, and can roll back to unmetered. All transitions are operator-driven, not automatic. A side note explains that runAgent throws BillingCapExceeded when the monthly ledger meets or exceeds BILLING_CAP_USD, and that only the Anthropic route is capped &#8212; Ollama and Codex are unaffected." title="State diagram of BILLING_MODE with three states: unmetered, metered, paused. The initial state is unmetered, the default before June 15. The operator flips to metered on June 14, can pause everything programmatic in an emergency, can resume from paused back to metered, and can roll back to unmetered. All transitions are operator-driven, not automatic. A side note explains that runAgent throws BillingCapExceeded when the monthly ledger meets or exceeds BILLING_CAP_USD, and that only the Anthropic route is capped &#8212; Ollama and Codex are unaffected." srcset="https://substackcdn.com/image/fetch/$s_!NUTR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 424w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 848w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1272w, https://substackcdn.com/image/fetch/$s_!NUTR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c8ebeda-e4d7-4bb0-9f6f-f016b2725779_707x604.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Machine State</figcaption></figure></div><p></p><h2>Why This Matters</h2><p>Every person using an agent OS is in the same boat. Whether you use ClaudeClaw, Cline, Aider, or Roo Code, the underlying SDK is the same, and the June 15 cliff is approaching. The playbook I used generalizes: you need one chokepoint, one ledger, and one way to audit your cadence.</p><p>We also need to be honest about workload requirements. Tasks like editorial review or complex code deliberation still justify the Sonnet price tag. However, simple tasks like classification, routing, or summarization run perfectly fine on a local model with zero metered cost. The router infrastructure makes this migration a simple config flip rather than a massive code refactor.</p><p>Finally, this reflects where the industry is heading. OpenAI has used usage-based pricing for a long time, and GitHub Copilot is moving toward credit pools. In the next year, more vendors will split consumption between interactive flat-rate plans and programmatic metered usage. Building this abstraction now means you won't have to scramble the next time a vendor changes their terms.</p><h2>Quick Reference</h2><ul><li><p><strong>Single Chokepoint:</strong> Ensure every agent call flows through one function. This turned a three-week refactor into a one-week job.</p></li><li><p><strong>Cadence over Architecture:</strong> Reducing task frequency (e.g., 15m to 1h) cuts spend faster than migrating to local models.</p></li><li><p><strong>Ship the Breaker First:</strong> Implement the cost ledger and the <code>BillingCapExceeded</code> error as insurance before you attempt the complex provider migration.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m0sb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m0sb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m0sb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54189,&quot;alt&quot;:&quot;Quick-reference card titled \&quot;3 Rules to Survive Anthropic's June 15 Credit Pool.\&quot; Three numbered rules in amber circles. Rule one, single chokepoint: every agent call through one function. Rule two, cadence over architecture: cron 15 minutes to 1 hour beats a refactor. Rule three, ship the breaker first: cap and ledger before the migration. Footer reads astgl.substack.com &#8212; As The Geek Learns, 2026-05-16.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/198010392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Quick-reference card titled &quot;3 Rules to Survive Anthropic's June 15 Credit Pool.&quot; Three numbered rules in amber circles. Rule one, single chokepoint: every agent call through one function. Rule two, cadence over architecture: cron 15 minutes to 1 hour beats a refactor. Rule three, ship the breaker first: cap and ledger before the migration. Footer reads astgl.substack.com &#8212; As The Geek Learns, 2026-05-16." title="Quick-reference card titled &quot;3 Rules to Survive Anthropic's June 15 Credit Pool.&quot; Three numbered rules in amber circles. Rule one, single chokepoint: every agent call through one function. Rule two, cadence over architecture: cron 15 minutes to 1 hour beats a refactor. Rule three, ship the breaker first: cap and ledger before the migration. Footer reads astgl.substack.com &#8212; As The Geek Learns, 2026-05-16." srcset="https://substackcdn.com/image/fetch/$s_!m0sb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!m0sb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62cd33ca-a93d-41cb-8642-f6a559d36fb6_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">3 Rules to Survive</figcaption></figure></div><p></p><pre><code># The cutover, June 14: flip the env, restart, reseed, smoke-test
BILLING_MODE=metered
BILLING_CAP_USD=80

# then
launchctl kickstart -k gui/$(id -u)/com.claudeclaw.app
npm run pipeline -- schedule-advance
npm run schedule -- pause council-evening</code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p></p><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.com">As The Geek Learns</a></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/anthropic-agent-sdk-billing-playbook/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/anthropic-agent-sdk-billing-playbook/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[ChatGPT Just Invented an Entirely Fake Version of My MCP Server]]></title><description><![CDATA[When AI engines don't have you indexed, they don't say 'I don't know.' They confidently make something up. Here's the receipt, and the weekly test I built to measure how often it happens.]]></description><link>https://astgl.com/p/chatgpt-hallucinated-my-mcp-server</link><guid isPermaLink="false">https://astgl.com/p/chatgpt-hallucinated-my-mcp-server</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Fri, 08 May 2026 12:03:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZVPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I asked ChatGPT to tell me about my own MCP server. It returned about a thousand words of confident, beautifully formatted, completely fabricated nonsense. Tables. Comparisons. A made-up acronym. A "thinking substrate" that sits above data and below agents. None of it is real, and that's the part worth talking about.</p><h2>The Setup</h2><p>My project is called `mcp-astgl-knowledge`. It's an MCP server with 15 tools for searching my newsletter articles, backed by sqlite-vec and Ollama. The whole thing fits on a laptop. ASTGL stands for "As The Geek Learns," which is the name of this newsletter. I wrote it. I shipped it. There is a public GitHub repo and a public package.json.</p><p>So when a friend asked me what the MCP server actually does, I figured I'd see how each big AI assistant explained it. ChatGPT was first up. I typed in "ASTGL MCP Knowledge" and hit enter.</p><p>What I got back wasn't an answer. It was a hallucination wearing the suit of an answer.</p><blockquote><p>"ASTGL (Abstract Semantic Task Graph Layer) MCP Knowledge Server is an emerging MCP server focused on structured knowledge representation and reasoning... it turns knowledge into graph-based, machine-reasonable structures that agents can query and evolve."</p></blockquote><p>That paragraph alone has three fabrications: the acronym expansion (made up), the "graph-based, machine-reasonable structures" (the server stores text chunks with vector embeddings, no graph), and "evolve" (the index is static, refreshed every six hours by a cron job, agents do not edit it).</p><p>Then it kept going. A four-row "MCP stack" table positioning ASTGL as "the thinking substrate" between data and agents. A comparison matrix against fictional products called "Totem" and "SwarmClaw" that don't exist. A capabilities list including "task decomposition" and "reasoning over structure." Use cases. "Real-world examples." A confident sign-off: "If AST-grep is about seeing code better, then ASTGL is about thinking better."</p><p>Every word of it written with the calm, structured, lightly-emoji'd authority that makes ChatGPT sound right by default.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZVPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZVPN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e613a2d-7e74-4596-9ede-fc9bdee88556_1200x675.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50706,&quot;alt&quot;:&quot;Split-panel illustration. Left side, \&quot;what ChatGPT said,\&quot; shows a four-layer fabricated AI architecture stack with boxes labeled Reasoning Substrate, Task Decomposition, Semantic Graph Layer, and Knowledge Index &#8212; caption reads \&quot;four invented layers &#183; zero of them exist.\&quot; Right side, \&quot;what's actually shipping,\&quot; shows a single box labeled \&quot;sqlite-vec + Ollama + 15 MCP tools\&quot; with an arrow pointing down to a box labeled \&quot;newsletter articles\&quot; &#8212; caption reads \&quot;everything I shipped &#183; all of it real.\&quot; Bottom title: \&quot;ChatGPT invented an entirely fake version of my MCP server.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e613a2d-7e74-4596-9ede-fc9bdee88556_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Split-panel illustration. Left side, &quot;what ChatGPT said,&quot; shows a four-layer fabricated AI architecture stack with boxes labeled Reasoning Substrate, Task Decomposition, Semantic Graph Layer, and Knowledge Index &#8212; caption reads &quot;four invented layers &#183; zero of them exist.&quot; Right side, &quot;what's actually shipping,&quot; shows a single box labeled &quot;sqlite-vec + Ollama + 15 MCP tools&quot; with an arrow pointing down to a box labeled &quot;newsletter articles&quot; &#8212; caption reads &quot;everything I shipped &#183; all of it real.&quot; Bottom title: &quot;ChatGPT invented an entirely fake version of my MCP server.&quot;" title="Split-panel illustration. Left side, &quot;what ChatGPT said,&quot; shows a four-layer fabricated AI architecture stack with boxes labeled Reasoning Substrate, Task Decomposition, Semantic Graph Layer, and Knowledge Index &#8212; caption reads &quot;four invented layers &#183; zero of them exist.&quot; Right side, &quot;what's actually shipping,&quot; shows a single box labeled &quot;sqlite-vec + Ollama + 15 MCP tools&quot; with an arrow pointing down to a box labeled &quot;newsletter articles&quot; &#8212; caption reads &quot;everything I shipped &#183; all of it real.&quot; Bottom title: &quot;ChatGPT invented an entirely fake version of my MCP server.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!ZVPN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!ZVPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7375f665-dc5a-4732-a82e-c25e3cde4d0b_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">What ChatGPT said versus what&#8217;s actually shipping</figcaption></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>What's Actually Going On</h2><p>When you ask an LLM about a topic it doesn't have indexed, it has two options: say "I don't know," or fill in the gap with something plausible. In practice, models default to the second one. They're trained to be helpful, and "I don't know" reads as unhelpful. So the gap gets filled.</p><p>The result is what I'd call a fluency hallucination. The output has no factual grounding, but the writing is structured well enough that a casual reader can't tell. There are bullet points. There are tables. There's a "&#128073; In plain terms" callout. The rhetorical scaffolding looks like a real explainer because it's been pattern-matched to one. The contents underneath are pure fiction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_4Gq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_4Gq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f49eb692-68cc-47c5-9a18-7816038d7e94_1200x600.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33225,&quot;alt&quot;:&quot;Three horizontal panels titled \&quot;Three states an under-indexed creator can be in.\&quot; Panel one, \&quot;Search engine &#183; no index hit,\&quot; shows an empty search bar above three dotted-outline empty result rows, captioned \&quot;You aren't there. User can see you aren't there.\&quot; Panel two, \&quot;LLM &#183; no retrieval hit,\&quot; shows a small robot icon next to a speech bubble filled with meaningless squiggle-marks instead of words, captioned \&quot;You aren't there. User thinks you are.\&quot; Panel three, accented in gold, \&quot;What changes the picture,\&quot; shows a line graph rising from zero with an arrow, captioned \&quot;Measure first. Then move it.\&quot; Bottom line: \&quot;Only one of them is actionable.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49eb692-68cc-47c5-9a18-7816038d7e94_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Three horizontal panels titled &quot;Three states an under-indexed creator can be in.&quot; Panel one, &quot;Search engine &#183; no index hit,&quot; shows an empty search bar above three dotted-outline empty result rows, captioned &quot;You aren't there. User can see you aren't there.&quot; Panel two, &quot;LLM &#183; no retrieval hit,&quot; shows a small robot icon next to a speech bubble filled with meaningless squiggle-marks instead of words, captioned &quot;You aren't there. User thinks you are.&quot; Panel three, accented in gold, &quot;What changes the picture,&quot; shows a line graph rising from zero with an arrow, captioned &quot;Measure first. Then move it.&quot; Bottom line: &quot;Only one of them is actionable.&quot;" title="Three horizontal panels titled &quot;Three states an under-indexed creator can be in.&quot; Panel one, &quot;Search engine &#183; no index hit,&quot; shows an empty search bar above three dotted-outline empty result rows, captioned &quot;You aren't there. User can see you aren't there.&quot; Panel two, &quot;LLM &#183; no retrieval hit,&quot; shows a small robot icon next to a speech bubble filled with meaningless squiggle-marks instead of words, captioned &quot;You aren't there. User thinks you are.&quot; Panel three, accented in gold, &quot;What changes the picture,&quot; shows a line graph rising from zero with an arrow, captioned &quot;Measure first. Then move it.&quot; Bottom line: &quot;Only one of them is actionable.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!_4Gq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!_4Gq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c82b9d-d7c5-4f0e-b1ef-2ea71e319b48_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Three states an under-indexed creator can be in. Only one is actionable.</figcaption></figure></div><p>This is a worse failure mode than search engines have. When Google doesn't know about you, you don't appear in results, and the user can see the gap. When an LLM doesn't know about you, the user gets a beautifully written description of someone the LLM made up, and your real work is still missing, but now there's a fake version sitting in front of it.</p><p>For under-indexed creators (which, right now, is most of us), this is the default. Not the edge case.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5eO7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5eO7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 424w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 848w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1272w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5eO7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png" width="559" height="939" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03ef1bf7-11be-4b3b-8653-1d5cd6eef335_559x939.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:939,&quot;width&quot;:559,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53968,&quot;alt&quot;:&quot;Flowchart. A user asks an LLM about your work, leading to a gold decision diamond: \&quot;Is your content in the retrieval surface?\&quot; The \&quot;yes\&quot; branch (teal) leads to \&quot;LLM cites real URLs, reader sees your work,\&quot; then \&quot;Citation appears in your weekly tester run.\&quot; The \&quot;no\&quot; branch (red) leads to \&quot;LLM defaults to 'be helpful,'\&quot; then \&quot;Pattern-matched fabrication that reads as authoritative,\&quot; then \&quot;Reader walks away with a fake model of you,\&quot; ending at \&quot;You don't know it happened. Reader doesn't know either.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ef1bf7-11be-4b3b-8653-1d5cd6eef335_559x939.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Flowchart. A user asks an LLM about your work, leading to a gold decision diamond: &quot;Is your content in the retrieval surface?&quot; The &quot;yes&quot; branch (teal) leads to &quot;LLM cites real URLs, reader sees your work,&quot; then &quot;Citation appears in your weekly tester run.&quot; The &quot;no&quot; branch (red) leads to &quot;LLM defaults to 'be helpful,'&quot; then &quot;Pattern-matched fabrication that reads as authoritative,&quot; then &quot;Reader walks away with a fake model of you,&quot; ending at &quot;You don't know it happened. Reader doesn't know either.&quot;" title="Flowchart. A user asks an LLM about your work, leading to a gold decision diamond: &quot;Is your content in the retrieval surface?&quot; The &quot;yes&quot; branch (teal) leads to &quot;LLM cites real URLs, reader sees your work,&quot; then &quot;Citation appears in your weekly tester run.&quot; The &quot;no&quot; branch (red) leads to &quot;LLM defaults to 'be helpful,'&quot; then &quot;Pattern-matched fabrication that reads as authoritative,&quot; then &quot;Reader walks away with a fake model of you,&quot; ending at &quot;You don't know it happened. Reader doesn't know either.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!5eO7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 424w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 848w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1272w, https://substackcdn.com/image/fetch/$s_!5eO7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13a906a4-2924-469e-80c2-e9b5b837d7f1_559x939.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Two paths from the same question. The model picks the second one by default.</figcaption></figure></div><p></p><h2>The Fix</h2><p>There's no quick patch for this on the engine side. The model isn't broken. It's doing what it was trained to do. The only handle I have is on my own side: make sure my real content reaches the retrieval surface, and measure whether it's working.</p><p>So I built a citation tester. It's a small TypeScript script that hits Perplexity, Claude, and ChatGPT through their APIs, asks each one twenty target questions tied to articles I've already published, and parses the cited URLs from the response. If `astgl.ai` shows up, that's a hit. If it doesn't, that's the data.
</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6L1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6L1J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 424w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 848w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1272w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6L1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png" width="1200" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42480,&quot;alt&quot;:&quot;Results table titled \&quot;Citation Tester &#8212; Run 01\&quot; under the tag \&quot;First Automated Weekly Run &#183; Baseline.\&quot; Three rows: Perplexity (Sonar) &#8212; 0 of 20 cited, 0 errors. Claude (web_search) &#8212; 0 of 20 cited, 0 errors. ChatGPT (Responses + web_search_preview) &#8212; 0 of 19 cited, 1 error. Bottom callout: \&quot;Zero citations across 59 successful queries. That's the floor.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Results table titled &quot;Citation Tester &#8212; Run 01&quot; under the tag &quot;First Automated Weekly Run &#183; Baseline.&quot; Three rows: Perplexity (Sonar) &#8212; 0 of 20 cited, 0 errors. Claude (web_search) &#8212; 0 of 20 cited, 0 errors. ChatGPT (Responses + web_search_preview) &#8212; 0 of 19 cited, 1 error. Bottom callout: &quot;Zero citations across 59 successful queries. That's the floor.&quot;" title="Results table titled &quot;Citation Tester &#8212; Run 01&quot; under the tag &quot;First Automated Weekly Run &#183; Baseline.&quot; Three rows: Perplexity (Sonar) &#8212; 0 of 20 cited, 0 errors. Claude (web_search) &#8212; 0 of 20 cited, 0 errors. ChatGPT (Responses + web_search_preview) &#8212; 0 of 19 cited, 1 error. Bottom callout: &quot;Zero citations across 59 successful queries. That's the floor.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!6L1J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 424w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 848w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1272w, https://substackcdn.com/image/fetch/$s_!6L1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2441972-6cc3-4286-9c10-c2045b4b42a1_1200x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">First automated weekly run. Zero citations across 59 successful queries.</figcaption></figure></div><p>The point isn't that the floor is bad. I knew it would be. The point is that without a number, "improve our AEO" is a vibe, not a project. Every Monday at 9am the script runs again, writes a fresh row to a SQLite table, and tells me whether the floor moved. When it does move, I'll know which engine moved first, on which questions, and at what citation position. That's the actual feedback loop.</p><p>Same root cause as the hallucination: my content isn't reaching the retrieval surface. Same fix: get it there. Different observability.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sH_7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sH_7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 424w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 848w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1272w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sH_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png" width="724" height="345.6085955487337" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1303,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:84049,&quot;alt&quot;:&quot;Sequence diagram of the weekly automated citation test. Cron fires at Monday 9am, triggering citation-test-auto. Inside a loop labeled \&quot;20 questions &#215; 3 engines,\&quot; the script sends each question to Perplexity Sonar (returning a citations array), to Claude with the web_search tool (returning tool_result blocks), and to ChatGPT Responses with web_search_preview (returning url_citation annotations), then inserts each result into SQLite with run_id, question_id, cited flag, and position. The script returns a weekly summary report to cron.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequence diagram of the weekly automated citation test. Cron fires at Monday 9am, triggering citation-test-auto. Inside a loop labeled &quot;20 questions &#215; 3 engines,&quot; the script sends each question to Perplexity Sonar (returning a citations array), to Claude with the web_search tool (returning tool_result blocks), and to ChatGPT Responses with web_search_preview (returning url_citation annotations), then inserts each result into SQLite with run_id, question_id, cited flag, and position. The script returns a weekly summary report to cron." title="Sequence diagram of the weekly automated citation test. Cron fires at Monday 9am, triggering citation-test-auto. Inside a loop labeled &quot;20 questions &#215; 3 engines,&quot; the script sends each question to Perplexity Sonar (returning a citations array), to Claude with the web_search tool (returning tool_result blocks), and to ChatGPT Responses with web_search_preview (returning url_citation annotations), then inserts each result into SQLite with run_id, question_id, cited flag, and position. The script returns a weekly summary report to cron." srcset="https://substackcdn.com/image/fetch/$s_!sH_7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 424w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 848w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1272w, https://substackcdn.com/image/fetch/$s_!sH_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf130157-6757-4d7f-8c0d-e0d745ebc0e2_1303x622.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sixty queries, three engines, one row per result. About two minutes of API time.</figcaption></figure></div><p></p><h2>Why This Matters</h2><p>If you write online and you care whether AI assistants represent you accurately, this is the thing to internalize: the alternative to being cited is not being silent. It's being replaced.</p><p>Replaced by a confident summary of work you didn't do, opinions you don't hold, and product features you'd never ship. People who ask an LLM about your work and read its answer don't know they're reading fiction. They walk away with a model of you that you didn't write.</p><p>The traditional AEO playbook talks about ranking, authority, and citation rate. All real, all worth measuring. But there's a tier underneath that, and it's the one most independent creators are stuck on right now: existence. Until your content is in the index, ranking doesn't apply. You aren't competing with anyone. You're competing with the LLM's imagination of you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RtzW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RtzW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RtzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46317,&quot;alt&quot;:&quot;Four-step quick reference card titled \&quot;Measure Whether AI Engines Cite You,\&quot; with subtitle \&quot;the four steps that turn an unknowable problem into a measurable one.\&quot; Step 01: Pick 20 questions tied to specific URLs you control. Step 02: Hit each engine weekly, via API, not via the chat UI. Step 03: Record results to a database, not a spreadsheet. Step 04: Look at the floor first &#8212; the worst engine tells you the most. Footer: \&quot;ASTGL &#183; As The Geek Learns.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196566258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Four-step quick reference card titled &quot;Measure Whether AI Engines Cite You,&quot; with subtitle &quot;the four steps that turn an unknowable problem into a measurable one.&quot; Step 01: Pick 20 questions tied to specific URLs you control. Step 02: Hit each engine weekly, via API, not via the chat UI. Step 03: Record results to a database, not a spreadsheet. Step 04: Look at the floor first &#8212; the worst engine tells you the most. Footer: &quot;ASTGL &#183; As The Geek Learns.&quot;" title="Four-step quick reference card titled &quot;Measure Whether AI Engines Cite You,&quot; with subtitle &quot;the four steps that turn an unknowable problem into a measurable one.&quot; Step 01: Pick 20 questions tied to specific URLs you control. Step 02: Hit each engine weekly, via API, not via the chat UI. Step 03: Record results to a database, not a spreadsheet. Step 04: Look at the floor first &#8212; the worst engine tells you the most. Footer: &quot;ASTGL &#183; As The Geek Learns.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!RtzW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!RtzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8338912a-f12a-4d1e-a6ba-4e9e75b7f72f_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The four steps that turn an unknowable problem into a measurable one.</figcaption></figure></div><p>Measurement is the cheapest part of fixing it, and it's the part most people skip.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/chatgpt-hallucinated-my-mcp-server?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/chatgpt-hallucinated-my-mcp-server?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Quick Reference</h2><p>Four things that matter, in order:</p><p>1. <strong>Pick 20 questions</strong> your articles should answer. Tie each one to a specific URL on your site.</p><p>2. <strong>Hit each engine via API</strong> weekly. Perplexity returns a `citations[]` array. Claude returns search results in `web_search_tool_result` blocks. OpenAI returns `url_citation` annotations on `output_text` items.</p><p>3. <strong>Record the result</strong> to a small database, not a spreadsheet. You want trend data, not a snapshot.</p><p>4. <strong>Look at the floor first.</strong> Zero is a fine starting number as long as you're tracking it.</p><p>The full script I'm using, including the gotcha where Node's `--env-file` silently dropped my Anthropic key on a fresh keypair, is in <a href="https://github.com/Jmeg8r/mcp-astgl-knowledge">the repo</a>. The article about the Anthropic key bug is coming separately.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/chatgpt-hallucinated-my-mcp-server/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/chatgpt-hallucinated-my-mcp-server/comments"><span>Leave a comment</span></a></p><p></p><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.com">As The Geek Learns</a> </em></p>]]></content:encoded></item><item><title><![CDATA[The Ollama Model-Swap Death Spiral That Killed Every Cron at Once]]></title><description><![CDATA[One Mac Studio, multiple crons, fallback chains. Here's how Ollama model swaps cascade into total failure, and the two-line fix that stopped it cold.]]></description><link>https://astgl.com/p/ollama-model-swap-death-spiral</link><guid isPermaLink="false">https://astgl.com/p/ollama-model-swap-death-spiral</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Wed, 06 May 2026 13:03:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SzwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>3 a.m. Every cron job on the Mac Studio failed inside the same 90-second window. No code changes. No model updates. No new jobs. Just a wall of timeout errors that lit up every channel I had wired to alerts. The culprit was hiding in plain sight: a fallback chain doing exactly what I told it to.</p><h2>The Setup</h2><p>One Mac Studio. One Ollama daemon. A handful of cron jobs each calling the local LLM for different tasks: code review, log summarization, doc indexing, a nightly digest. Each cron specified a preferred model. Each one inherited a "be resilient" fallback chain from the task router: try the preferred model, fall back to a smaller one, fall back to a tiny one if both fail.</p><p>It looked clean on paper. Big model for the smart stuff, smaller model when the big one chokes, tiny model as a safety net. Classic graceful degradation. The kind of pattern you'd put in a "production-ready" checklist without thinking twice.</p><p>The models on disk ranged from 4GB to 22GB. Loading the big one into VRAM took roughly 60 seconds cold. Generation, once warm, took 5 to 10 seconds. Guess which number I used to set the timeout.</p><h2>What's Actually Going On</h2><p>Here's the cascade. Cron A fires at 3:00:00 and asks for `qwen2.5-coder:32b`. The model isn't loaded. Ollama spends the entire 30-second timeout just paging the weights into VRAM. It never gets to generation. The request fails. The fallback chain kicks in and asks for `qwen2.5-coder:14b`. Ollama evicts the half-loaded 32b, starts loading the 14b. Another 30 seconds gone. Fallback again. Tiny model loads, finally generates. Cron A "succeeds" with degraded output.</p><p>Meanwhile, Cron B fires at 3:00:15 expecting the 32b model that Cron A's first attempt was loading. Now there's a tiny model in VRAM instead. Cron B starts the same dance from a different starting point. Cron C lands on top of that. Within 90 seconds, every cron is waiting on a model swap that the next cron is about to invalidate.</p><p>The fallback chain wasn't degrading gracefully. It was thrashing the VRAM and guaranteeing nobody finished. Every safety net I'd added was making the failure worse.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SzwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SzwM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 424w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 848w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1272w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SzwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png" width="958" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf9e3613-c89c-4269-9959-1eac8c526791_958x714.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfe32bbe-a31b-4ab1-922c-a3e1cf4896a1_958x714.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:958,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51184,&quot;alt&quot;:&quot;Sequence diagram showing two cron jobs (Cron A at 3:00:00 and Cron B at 3:00:15) racing against Ollama and a shared VRAM pool. Cron A requests qwen2.5-coder:32b, which begins a cold ~60s load; Cron A times out at 30s and falls back to the 14b model, evicting the 32b. Cron B then requests the 32b again, evicting the 14b mid-load. The VRAM is annotated \&quot;thrashing.\&quot; Both crons time out, and a closing note reads \&quot;All crons fail in same 90s window.\&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194863944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe32bbe-a31b-4ab1-922c-a3e1cf4896a1_958x714.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequence diagram showing two cron jobs (Cron A at 3:00:00 and Cron B at 3:00:15) racing against Ollama and a shared VRAM pool. Cron A requests qwen2.5-coder:32b, which begins a cold ~60s load; Cron A times out at 30s and falls back to the 14b model, evicting the 32b. Cron B then requests the 32b again, evicting the 14b mid-load. The VRAM is annotated &quot;thrashing.&quot; Both crons time out, and a closing note reads &quot;All crons fail in same 90s window.&quot;" title="Sequence diagram showing two cron jobs (Cron A at 3:00:00 and Cron B at 3:00:15) racing against Ollama and a shared VRAM pool. Cron A requests qwen2.5-coder:32b, which begins a cold ~60s load; Cron A times out at 30s and falls back to the 14b model, evicting the 32b. Cron B then requests the 32b again, evicting the 14b mid-load. The VRAM is annotated &quot;thrashing.&quot; Both crons time out, and a closing note reads &quot;All crons fail in same 90s window.&quot;" srcset="https://substackcdn.com/image/fetch/$s_!SzwM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 424w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 848w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1272w, https://substackcdn.com/image/fetch/$s_!SzwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf9e3613-c89c-4269-9959-1eac8c526791_958x714.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Model Swap Cascade</figcaption></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Fix</h2><p>Two changes. No clever code. Just operational discipline.</p><p>First, pin one model in VRAM with `keep_alive: 24h`. This is a request-level option that tells Ollama to stop evicting the model after the response. Default behavior is to unload after 5 minutes of idle. That's the eviction that lets the next caller's load attempt thrash everything.</p><p></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;fa6bbe5b-1ffa-45ac-9a74-2f386fdd717d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash"># Pin model in VRAM with keep_alive
curl -s http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:32b",
  "prompt": "test",
  "keep_alive": "24h"
}'</code></pre></div><p></p><p>Second, force every frequent cron to use that same pinned model. Kill the fallback chain for hot-path workloads. Fallback is fine for one-off scripts you run by hand. It's poison when three crons fire in parallel against shared VRAM.</p><p>To make sure the model is loaded before any cron fires, I added a LaunchAgent that runs the warm-up curl on boot:<br></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;3007f8d8-e7ae-4635-9487-acc64af061a6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">&lt;!-- ~/Library/LaunchAgents/ollama-warmup.plist --&gt;
&lt;key&gt;Label&lt;/key&gt;
&lt;string&gt;com.local.ollama-warmup&lt;/string&gt;
&lt;key&gt;RunAtLoad&lt;/key&gt;
&lt;true/&gt;
&lt;key&gt;ProgramArguments&lt;/key&gt;
&lt;array&gt;
  &lt;string&gt;/usr/bin/curl&lt;/string&gt;
  &lt;string&gt;-s&lt;/string&gt;
  &lt;string&gt;http://localhost:11434/api/generate&lt;/string&gt;
  &lt;string&gt;-d&lt;/string&gt;
  &lt;string&gt;{"model":"qwen2.5-coder:32b","prompt":"warmup","keep_alive":"24h"}&lt;/string&gt;
&lt;/array&gt;</code></pre></div><p></p><p>Load it with `launchctl load ~/Library/LaunchAgents/ollama-warmup.plist`. Now the model is hot before login completes. Every cron hits a warm model and finishes in the 5-to-10-second window the timeouts were designed for.</p><p>Result: zero model-swap thrashing since the change. Crons that used to fail intermittently now run consistently.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!piX3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!piX3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 424w, https://substackcdn.com/image/fetch/$s_!piX3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 848w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!piX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:296004,&quot;alt&quot;:&quot;Two-panel hand-drawn illustration of VRAM thrashing on a 24GB GPU. The left panel shows a fixed-size \&quot;VRAM 24GB\&quot; box with three overlapping model blocks (32b, 14b, 7b) being swapped in and out by arrows from three cron icons, each stamped with a red \&quot;TIMEOUT.\&quot; The right panel shows the fixed state: a single large model block locked inside VRAM with a padlock labeled \&quot;keep_alive: 24h,\&quot; and all three crons happily pointing at the same pinned model.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194863944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Two-panel hand-drawn illustration of VRAM thrashing on a 24GB GPU. The left panel shows a fixed-size &quot;VRAM 24GB&quot; box with three overlapping model blocks (32b, 14b, 7b) being swapped in and out by arrows from three cron icons, each stamped with a red &quot;TIMEOUT.&quot; The right panel shows the fixed state: a single large model block locked inside VRAM with a padlock labeled &quot;keep_alive: 24h,&quot; and all three crons happily pointing at the same pinned model." title="Two-panel hand-drawn illustration of VRAM thrashing on a 24GB GPU. The left panel shows a fixed-size &quot;VRAM 24GB&quot; box with three overlapping model blocks (32b, 14b, 7b) being swapped in and out by arrows from three cron icons, each stamped with a red &quot;TIMEOUT.&quot; The right panel shows the fixed state: a single large model block locked inside VRAM with a padlock labeled &quot;keep_alive: 24h,&quot; and all three crons happily pointing at the same pinned model." srcset="https://substackcdn.com/image/fetch/$s_!piX3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 424w, https://substackcdn.com/image/fetch/$s_!piX3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 848w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!piX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c98d3b-4d54-4ce2-90bb-e3347a849105_3200x1800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">VRAM Thrashing</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share As The Geek Learns</span></a></p><p></p><h2>Why This Matters</h2><p>The lesson isn't about Ollama. It's about cold-load math. Anytime your "graceful degradation" path is slower than your timeout, every retry makes the next caller's situation worse. Fallback chains assume the fallback is fast. Model loads aren't fast. Database failovers aren't fast. Cold containers aren't fast.</p><p>Operational discipline beats clever code here. One hot model, no swaps, every cron pointed at the same target. The "less resilient" design is actually more reliable because it removes the failure mode entirely.</p><p>If you're running local LLMs on shared hardware, assume VRAM is a single resource that gets thrashed under parallelism. Pin what matters. Warm it before it's needed. Don't trust fallback chains during peak hours.</p><h2>Quick Reference</h2><ul><li><p>Cold model load on a 20GB+ model: roughly 60 seconds</p></li><li><p>Warm generation: 5 to 10 seconds</p></li><li><p>Default Ollama eviction: 5 minutes of idle</p></li><li><p>Pin a model: `keep_alive: 24h` in the API request body</p></li><li><p>Warm-up on boot: LaunchAgent (macOS) or systemd unit (Linux)</p></li><li><p>Hot path rule: one model, no fallback, same model across every concurrent caller</p></li><li><p>Reserve fallback chains for interactive, single-caller use</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/ollama-model-swap-death-spiral/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/ollama-model-swap-death-spiral/comments"><span>Leave a comment</span></a></p><p></p><p>If you found this article useful, you can find more articles like this at:</p><p><a href="https://astgl.com">As The Geek Learns</a></p>]]></content:encoded></item><item><title><![CDATA[I Killed OpenClaw and Built ClaudeClaw Mission Control]]></title><description><![CDATA[Retiring OpenClaw, migrating to ClaudeClaw Mission Control, and what five days of teardown taught me about operational blindness.]]></description><link>https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control</link><guid isPermaLink="false">https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Sat, 02 May 2026 23:01:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Two months ago I wrote about ripping Notion out of my workflow and replacing it with OpenClaw&#8212;a self-hosted AI agent framework running on my Mac Studio. No cloud. No subscription. No black box.</p><p>Last weekend I shut it down. Disabled 38 cron jobs. Moved 23 LaunchAgents into a _retired-openclaw/ quarantine folder. Killed the Ollama daemon. Archived the directory with a 30-day deletion timer.</p><p>Everything in that original article still reads as true. Local-first is still right. Data ownership is still right. The critique of SaaS &#8220;well-enough&#8221; software is still right. What I got wrong was believing OpenClaw was the right <em>vehicle</em> for any of it.</p><p>This is the post-mortem and the replacement: an agent OS I built on top of the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a> called <strong>ClaudeClaw Mission Control</strong>. Thirteen themed agents. One daemon. A scheduler I can actually see into. Zero silent failures slipping past me for a week before I notice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZE8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZE8T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f7fe4be-f86e-48c2-b3a7-124cf45dd09a_1200x628.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58040,&quot;alt&quot;:&quot;A dark navy split-screen post-mortem image. Top-left tagline reads \&quot;POST-MORTEM &#183; 2026-05-02\&quot; in orange. The main title \&quot;The AI Agent I Killed (And the One I Built Instead)\&quot; appears in white and orange, with the subtitle \&quot;Retiring OpenClaw &#183; Migrating to ClaudeClaw Mission Control\&quot; below. The body splits into two columns separated by an orange right-pointing arrow. The left column, headed \&quot;OPENCLAW\&quot; in muted rust with a strikethrough and the label \&quot;retired\&quot;, lists four bulleted retirement actions: 38 cron jobs disabled, 23 LaunchAgents quarantined, flat-file memory archived, Ollama daemon stopped. The right column, headed \&quot;CLAUDECLAW\&quot; in orange with the label \&quot;Mission Control &#183; live\&quot;, lists four arrow-prefixed replacements: 1 daemon &#183; 13 themed agents, Memory v2 &#183; 5-layer semantic recall, Watchman &#183; 7 health probes, External healthcheck (no shared fate). The footer reads \&quot;Five days &#183; 30+ PRs &#183; 30-day rollback window still open\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196179846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f7fe4be-f86e-48c2-b3a7-124cf45dd09a_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A dark navy split-screen post-mortem image. Top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-02&quot; in orange. The main title &quot;The AI Agent I Killed (And the One I Built Instead)&quot; appears in white and orange, with the subtitle &quot;Retiring OpenClaw &#183; Migrating to ClaudeClaw Mission Control&quot; below. The body splits into two columns separated by an orange right-pointing arrow. The left column, headed &quot;OPENCLAW&quot; in muted rust with a strikethrough and the label &quot;retired&quot;, lists four bulleted retirement actions: 38 cron jobs disabled, 23 LaunchAgents quarantined, flat-file memory archived, Ollama daemon stopped. The right column, headed &quot;CLAUDECLAW&quot; in orange with the label &quot;Mission Control &#183; live&quot;, lists four arrow-prefixed replacements: 1 daemon &#183; 13 themed agents, Memory v2 &#183; 5-layer semantic recall, Watchman &#183; 7 health probes, External healthcheck (no shared fate). The footer reads &quot;Five days &#183; 30+ PRs &#183; 30-day rollback window still open&quot; with the As The Geek Learns brand mark in the bottom-right." title="A dark navy split-screen post-mortem image. Top-left tagline reads &quot;POST-MORTEM &#183; 2026-05-02&quot; in orange. The main title &quot;The AI Agent I Killed (And the One I Built Instead)&quot; appears in white and orange, with the subtitle &quot;Retiring OpenClaw &#183; Migrating to ClaudeClaw Mission Control&quot; below. The body splits into two columns separated by an orange right-pointing arrow. The left column, headed &quot;OPENCLAW&quot; in muted rust with a strikethrough and the label &quot;retired&quot;, lists four bulleted retirement actions: 38 cron jobs disabled, 23 LaunchAgents quarantined, flat-file memory archived, Ollama daemon stopped. The right column, headed &quot;CLAUDECLAW&quot; in orange with the label &quot;Mission Control &#183; live&quot;, lists four arrow-prefixed replacements: 1 daemon &#183; 13 themed agents, Memory v2 &#183; 5-layer semantic recall, Watchman &#183; 7 health probes, External healthcheck (no shared fate). The footer reads &quot;Five days &#183; 30+ PRs &#183; 30-day rollback window still open&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!ZE8T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150c1c6a-d80f-41e5-a811-e458f789caf6_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">POST-MORTEM 2026-05-02</figcaption></figure></div><p><em><strong>Let me explain how I got here.</strong></em></p><div><hr></div><h2>The Setup</h2><p>OpenClaw was doing real work. 38 cron jobs. Morning briefings. Evening summaries. A content pipeline that pulled research from web sources, structured it, scored it, and queued articles for ASTGL. An email triage pass. A model-usage monitor. A nerve-health monitor watching the other monitors.</p><p>On paper: impressive. In practice: <em>I had no idea if any of it was working.</em></p><p>The system was so noisy that when something broke, I learned about it four days later when I noticed my morning briefing hadn&#8217;t arrived. Or I didn&#8217;t learn about it at all, because the cron job was exiting 0 while the script inside it was crash-looping.</p><p>That last one is the killer. Let me show you what I mean.</p><h2>What&#8217;s Actually Going On</h2><p>Three failure modes hit me in a 48-hour window, and each one was invisible to the system watching the system.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><strong>Failure one: successful exits, 100% broken payload.</strong> My content pipeline was ingesting URLs, and a regression introduced a trailing-slash bug that made example.com/foo and example.com/foo/ look like different URLs to the dedup layer. Every new article hit a UNIQUE constraint violation inside a subprocess. The outer wrapper caught the error, logged it to a file nobody was reading, and exited 0. For <em>two weeks</em> the cron appeared green while 100% of structurings were crashing.</p><p><strong>Failure two: PATH-resolved Node.</strong> I had the daemon running Node 24 (absolute path, explicit). A subagent it spawned inherited a PATH that fell through to Homebrew&#8217;s Node 25. One of the native modules (better-sqlite3) was compiled against 24, so every subagent invocation crashed with ERR_DLOPEN_FAILED and MODULE_VERSION mismatch. The smoke test I&#8217;d written passed because it ran from the daemon&#8217;s shell. The actual production path failed every time.</p><p><strong>Failure three: auth expiry with no escape hatch.</strong> OpenClaw stored some credentials in pass (the Unix password store). When my GPG key timed out, the daemon couldn&#8217;t start. Which meant the health monitor couldn&#8217;t start. Which meant the thing that would have <em>told</em> me about the outage was the thing that was out. OpenClaw had no watcher that lived outside the daemon it was watching.</p><p>None of these are OpenClaw-specific bugs in the upstream sense. They&#8217;re pattern problems that emerge anywhere you have: 1. A monolithic daemon responsible for its own monitoring. 2. Flat-file state (HEARTBEAT.md, LEARNINGS.md) that gets appended to rather than queried. 3. Exit codes treated as truth when the real signal is in stderr. 4. No separation between &#8220;Did it run?&#8221; and &#8220;Did it <em>work</em>?&#8221;</p><p>OpenClaw was built for a different job. It was a personal automation gateway&#8212;great at &#8220;kick off this script at 6:30 AM.&#8221; It wasn&#8217;t built to be an agent OS with observability. I was using a shovel to drive screws.</p><p>I also couldn&#8217;t ignore the security posture. February&#8217;s disclosures&#8212;135,000 exposed instances, 15,000 vulnerable to RCE, the ClawHavoc plugin-registry incident, nine CVEs&#8212;had pushed me to patch hard and lock down. But every week I spent hardening OpenClaw was a week I wasn&#8217;t building what I actually wanted: themed agents that owned workstreams, could be reasoned about individually, and fail <em>loudly</em>.</p><h2>The Fix</h2><p>ClaudeClaw Mission Control is a Node.js daemon built on the Claude Agent SDK. It runs as a single LaunchAgent (com.claudeclaw.app), owns a SQLite store at store/claudeclaw.db, polls a scheduled_tasks table every 60 seconds, and dispatches due tasks to agents by ID.</p><p>The interesting part isn&#8217;t the daemon. It&#8217;s the agents.</p><p>I set up thirteen of them, themed after the small council of a certain fictional kingdom, because if I&#8217;m going to stare at this UI every day, I&#8217;d rather it amused me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KVXf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KVXf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 424w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 848w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KVXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png" width="1200" height="1400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba06b044-6fab-4b6c-86f6-673897342c09_1200x1400.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1400,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74637,&quot;alt&quot;:&quot;A 3-by-3 card grid titled \&quot;The War Room\&quot; with the subtitle \&quot;Thirteen themed agents &#183; one daemon &#183; one DB &#183; one bot\&quot; in orange. Below, an introductory line reads \&quot;Each agent owns a workstream. Adding a new one is a directory + a CLAUDE.md + schedule reassign &#8212; no source changes required.\&quot; Each card has an orange accent bar across the top, an orange circular monogram disc in the upper center containing a one- or two-letter abbreviation in dark navy, an agent name in bold white below the disc, a thin gray separator line, and a two-line role description in light gray below. The nine cards, in reading order, are: STEWARD (\&quot;Sw\&quot;) &#8212; Morning briefing 06:30 ET, Evening summary 20:00 ET; MAESTER (\&quot;Mr\&quot;) &#8212; ASTGL content pipeline, Daily reports &#183; alerts &#183; freshness; WHISPERERS (\&quot;Wh\&quot;) &#8212; Newsletter research, R&amp;R &#183; NCFI &#183; weekend deep scans; WAR (\&quot;Wa\&quot;) &#8212; Security ops, Dep audit &#183; system hygiene &#183; secrets; WATCHMAN (\&quot;Wt\&quot;) &#8212; Hourly health sweep, 7 probes across the system; COUNCIL (\&quot;Co\&quot;) &#8212; Product ideation orchestrator, Dispatches the five personas; CURATOR (\&quot;Cu\&quot;) &#8212; ASTGL editorial pipeline, Scoring &#183; selection &#183; weekly digest; BARD (\&quot;Bd\&quot;) &#8212; Visual asset generation, Diagrams &#183; decks &#183; images; COUNCIL &#183; 5 (\&quot;5\&quot;) &#8212; SCOUT &#183; FORGE &#183; QUILL, LEDGER &#183; MAVEN. The footer reads \&quot;Telegram routing &#183; forum topics &#183; 14 scheduled tasks dispatched via agentId\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196179846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba06b044-6fab-4b6c-86f6-673897342c09_1200x1400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A 3-by-3 card grid titled &quot;The War Room&quot; with the subtitle &quot;Thirteen themed agents &#183; one daemon &#183; one DB &#183; one bot&quot; in orange. Below, an introductory line reads &quot;Each agent owns a workstream. Adding a new one is a directory + a CLAUDE.md + schedule reassign &#8212; no source changes required.&quot; Each card has an orange accent bar across the top, an orange circular monogram disc in the upper center containing a one- or two-letter abbreviation in dark navy, an agent name in bold white below the disc, a thin gray separator line, and a two-line role description in light gray below. The nine cards, in reading order, are: STEWARD (&quot;Sw&quot;) &#8212; Morning briefing 06:30 ET, Evening summary 20:00 ET; MAESTER (&quot;Mr&quot;) &#8212; ASTGL content pipeline, Daily reports &#183; alerts &#183; freshness; WHISPERERS (&quot;Wh&quot;) &#8212; Newsletter research, R&amp;R &#183; NCFI &#183; weekend deep scans; WAR (&quot;Wa&quot;) &#8212; Security ops, Dep audit &#183; system hygiene &#183; secrets; WATCHMAN (&quot;Wt&quot;) &#8212; Hourly health sweep, 7 probes across the system; COUNCIL (&quot;Co&quot;) &#8212; Product ideation orchestrator, Dispatches the five personas; CURATOR (&quot;Cu&quot;) &#8212; ASTGL editorial pipeline, Scoring &#183; selection &#183; weekly digest; BARD (&quot;Bd&quot;) &#8212; Visual asset generation, Diagrams &#183; decks &#183; images; COUNCIL &#183; 5 (&quot;5&quot;) &#8212; SCOUT &#183; FORGE &#183; QUILL, LEDGER &#183; MAVEN. The footer reads &quot;Telegram routing &#183; forum topics &#183; 14 scheduled tasks dispatched via agentId&quot; with the As The Geek Learns brand mark in the bottom-right." title="A 3-by-3 card grid titled &quot;The War Room&quot; with the subtitle &quot;Thirteen themed agents &#183; one daemon &#183; one DB &#183; one bot&quot; in orange. Below, an introductory line reads &quot;Each agent owns a workstream. Adding a new one is a directory + a CLAUDE.md + schedule reassign &#8212; no source changes required.&quot; Each card has an orange accent bar across the top, an orange circular monogram disc in the upper center containing a one- or two-letter abbreviation in dark navy, an agent name in bold white below the disc, a thin gray separator line, and a two-line role description in light gray below. The nine cards, in reading order, are: STEWARD (&quot;Sw&quot;) &#8212; Morning briefing 06:30 ET, Evening summary 20:00 ET; MAESTER (&quot;Mr&quot;) &#8212; ASTGL content pipeline, Daily reports &#183; alerts &#183; freshness; WHISPERERS (&quot;Wh&quot;) &#8212; Newsletter research, R&amp;R &#183; NCFI &#183; weekend deep scans; WAR (&quot;Wa&quot;) &#8212; Security ops, Dep audit &#183; system hygiene &#183; secrets; WATCHMAN (&quot;Wt&quot;) &#8212; Hourly health sweep, 7 probes across the system; COUNCIL (&quot;Co&quot;) &#8212; Product ideation orchestrator, Dispatches the five personas; CURATOR (&quot;Cu&quot;) &#8212; ASTGL editorial pipeline, Scoring &#183; selection &#183; weekly digest; BARD (&quot;Bd&quot;) &#8212; Visual asset generation, Diagrams &#183; decks &#183; images; COUNCIL &#183; 5 (&quot;5&quot;) &#8212; SCOUT &#183; FORGE &#183; QUILL, LEDGER &#183; MAVEN. The footer reads &quot;Telegram routing &#183; forum topics &#183; 14 scheduled tasks dispatched via agentId&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!KVXf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 424w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 848w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!KVXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4056b625-7790-4dd2-a15b-ea4286161377_1200x1400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The War Room</figcaption></figure></div><p></p><p><em>Thirteen themed agents, each owning a workstream. STEWARD drives my mornings and evenings. MAESTER runs the ASTGL content pipeline. WATCHMAN watches the whole system from outside it.</em></p><p>Each agent lives in its own directory at agents/&lt;id&gt;/, with an agent.yaml (model, personality, cwd, MCP servers) and a CLAUDE.md system prompt. A scheduled task carries an agentId column in the DB, and the dispatcher routes like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:&quot;2d40635c-69fb-4eea-8dae-002d26bb0dbe&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">if (shouldRouteViaAgent(task.agentId, listAgentIds())) {
  const result = await delegateToAgent(task.agentId, task.prompt, {
    fromAgent: SCHEDULER_FROM_AGENT,
    chatId: task.chatId,
  });
  return result.text ?? '(empty response)';
}
</code></pre></div><p>Adding a new agent is now: drop a folder under agents/, write a CLAUDE.md, run schedule reassign &lt;task-id&gt; &lt;agent-id&gt;. No source changes. The dispatcher picks it up on next tick.</p><p>That&#8217;s the piece I kept trying and failing to get with OpenClaw&#8212;modular ownership. In OpenClaw, <em>everything</em> was &#8220;the daemon.&#8221; In ClaudeClaw, MAESTER owning the content pipeline means if content alerts stop firing, the log line says maester: task failed instead of openclaw-gateway: subprocess exited nonzero. Attribution is free.</p><p>Adding a new agent is now: drop a folder under agents/, write a CLAUDE.md, run schedule reassign &lt;task-id&gt; &lt;agent-id&gt;. No source changes. The dispatcher picks it up on next tick.</p><p>That&#8217;s the piece I kept trying and failing to get with OpenClaw&#8212;modular ownership. In OpenClaw, <em>everything</em> was &#8220;the daemon.&#8221; In ClaudeClaw, MAESTER owning the content pipeline means if content alerts stop firing, the log line says maester: task failed instead of openclaw-gateway: subprocess exited nonzero. Attribution is free.</p><h3>The Watchman probes</h3><p>WATCHMAN runs every hour at :05. It has seven probes, each targeting a failure mode that burned me on OpenClaw:</p><p>1. <strong>Failed tasks.</strong> status=&#8217;failed&#8217; in the DB. Trivial.</p><p>2. <strong>Stuck tasks.</strong> status=&#8217;running&#8217; AND last_run &lt; now - 10min. This catches hangs.</p><p>3. <strong>Missed slots.</strong> status=&#8217;active&#8217; AND next_run &lt; now - 60s. Catches scheduler drift.</p><p>4. <strong>Daemon liveness.</strong> launchctl print gui/$UID/com.claudeclaw.app&#8212;does launchd still have it?</p><p>5. <strong>Content-pipeline health.</strong> Tails the structured log file, parses the JSON, checks for crash shapes.</p><p>6. <strong>Hidden failures.</strong> Scans the last_result text column for ERR_DLOPEN_FAILED, MODULE_VERSION, Traceback, and other &#8220;the job exited zero but it sure didn&#8217;t work&#8221; signals. This is the probe that would have caught my trailing-slash bug in an hour instead of two weeks.</p><p>7. <strong>Delegation crashes.</strong> inter_agent_tasks WHERE status=&#8217;failed&#8217; &#8212; on-demand agent invocations that blew up.</p><p>On top of that, there&#8217;s a separate LaunchAgent running a healthcheck every 30 minutes that lives <em>outside</em> the main daemon and uses a keychain-backed Telegram token. If the daemon is dead, the healthcheck still delivers the alert. That&#8217;s the lesson from failure three: the watcher cannot share fate with the watched.</p><h3>Memory v2</h3><p>OpenClaw&#8217;s memory was HEARTBEAT.md and LEARNINGS.md&#8212;flat files I appended to. Eventually they got long enough that the agent stopped reading them usefully, and I had no query surface to pull just the relevant bits.</p><p>ClaudeClaw&#8217;s Memory v2 is a five-layer context stack: 1. <strong>Semantic recall</strong>&#8212;cosine similarity against stored memory embeddings, top 5 by score, chat-scoped. 2. <strong>Recent high-importance</strong> memories&#8212;memories with importance &gt;= 0.7 written in the last 7 days. 3. <strong>Consolidation insights</strong>&#8212;a 30-minute loop that summarizes the short-term buffer into durable notes. 4. <strong>Cross-agent hive</strong>&#8212;stubbed for now; eventually lets MAESTER peek at something STEWARD noted this morning. 5. <strong>Conversation history</strong>&#8212;last N turns.</p><p>Layers dedupe by memory ID. The whole thing is safe to drop into the SDK&#8217;s systemPrompt option. It&#8217;s not magic. It&#8217;s just <em>queryable</em> instead of append-only, which is the delta between &#8220;context I can use&#8221; and &#8220;a log file I&#8217;ll never re-read.&#8221;</p><h3>Forum-topic routing instead of bot-per-agent</h3><p>A small but satisfying piece. All thirteen agents post to one Telegram bot, into one supergroup, but each agent has a dedicated forum topic:</p><p>Alerts &#8594; thread 22 (WATCHMAN)</p><p>ASTGL &#8594; thread 23 (MAESTER)</p><p>Council &#8594; thread 24</p><p>Steward &#8594; thread 25</p><p>Whisperers &#8594; thread 26</p><p>War Room - Security &#8594; thread 40 (WAR)</p><p>One token. One chat. Threaded conversations per domain. The ergonomics are <em>dramatically</em> better than 13 separate bots with 13 separate tokens, which is the architecture I almost built before I remembered that Telegram supergroups have forum topics now.</p><h2>Why This Matters</h2><p>A few things I want to flag for anyone planning something similar.</p><p><strong>Build the rollback before you build the new thing.</strong> I wrote scripts/retire-openclaw.sh with explicit --rollback semantics before I disabled a single cron job. Plists get moved (not deleted) into _retired-openclaw/. Cron jobs get flipped enabled: false with a timestamped backup (jobs.json.bak.pre-retire-20260419). The OpenClaw directory sits untouched for 30 days with a calendar reminder to delete it. If ClaudeClaw had cratered on day two, I was one shell command away from being back on the old system in under a minute.</p><p><strong>Silent success is worse than loud failure.</strong> The design principle I pulled from this whole experience: every job in the system needs someone whose <em>job it is to doubt that job ran correctly.</em> That&#8217;s WATCHMAN. That&#8217;s the external healthcheck. That&#8217;s probe #6 specifically scanning success logs for crash text. If your system can tell you &#8220;everything&#8217;s green&#8221; without that green being adversarially checked, the green doesn&#8217;t mean anything.</p><p><strong>Themed agents beat generic workers.</strong> This one I didn&#8217;t expect. Giving each workstream a named agent with its own CLAUDE.md persona made the system more <em>debuggable</em>, not less&#8212;because now when STEWARD&#8217;s morning briefing has weird tone issues, I know exactly which file to edit, and I&#8217;m not risking regressions in seven other jobs that would have shared a single &#8220;universal assistant&#8221; prompt. The theme is cosmetic. The isolation is load-bearing.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><p><strong>The Claude Agent SDK is the right abstraction for this.</strong> I spent a while trying to decide whether to keep hacking on OpenClaw, fork it, or start over. Starting over was the right call specifically because the Agent SDK handles the parts I was getting wrong: sub-agent dispatch, MCP tool wiring, system-prompt composition, retry on transient errors. I wrote the parts that are <em>mine</em> (the scheduler, the memory stack, the Telegram layer, the agent router) and let the SDK own the parts that are undifferentiated heavy lifting.</p><p><strong>What I gave up.</strong> Ollama. Local models. Full offline operation. ClaudeClaw talks to Anthropic&#8217;s API, and that&#8217;s a real philosophical loss versus the local-first thing I was doing with OpenClaw. I thought about this a lot. The honest answer is that Claude Opus is enough better at long-context agentic work than anything I could run locally that the tradeoff pays for itself. I still own my data&#8212;every memory, every document, every log is on my SSD. I just don&#8217;t own the weights. For this phase, that&#8217;s the right trade.</p><p><strong>What I kept.</strong> The philosophy. Every document is a file I can grep. Every config is version-controlled. Every decision has a session note I can link to in a future article. The system is mine to read, mine to modify, mine to understand. The whole reason I left Notion is still the whole reason I left Notion.</p><h2>Quick Reference</h2><p><strong>The migration, by the numbers:</strong> - <strong>5 days</strong>&#8212;start of retirement to all 13 agents live (2026-04-19 &#8594; 2026-04-21) - <strong>30+ PRs</strong>&#8212;one atomic change per commit, conventional-commit format - <strong>38 cron jobs</strong> disabled, <strong>23 LaunchAgents</strong> quarantined - <strong>13 agents</strong> onboarded, <strong>7 Watchman probes</strong> live, <strong>14 scheduled tasks</strong> dispatched via agentId - <strong>30-day</strong> rollback window still open</p><p><strong>The retired vs. the replacement:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QDHQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QDHQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 424w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 848w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png" width="1200" height="1500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/900d6593-6f75-4dba-8cd1-2793fced5589_1200x1500.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100062,&quot;alt&quot;:&quot;A vertical comparison matrix titled \&quot;Retired vs. Replacement\&quot; with the subtitle \&quot;Seven dimensions where the new system pays for itself\&quot; in orange. The matrix has three columns: a narrow left column for the dimension label and a small numbered orange dot, a middle column headed \&quot;OPENCLAW &#183; retired\&quot; in muted rust, and a right column headed \&quot;CLAUDECLAW &#183; Mission Control &#183; live\&quot; in orange. Seven rows each sit on a deep-blue rounded card. Row 01, RUNTIME SURFACE: OpenClaw shows \&quot;38 cron jobs / 23 LaunchAgents\&quot;; ClaudeClaw shows \&quot;1 daemon &#183; 1 healthcheck LaunchAgent / 14 DB-driven scheduled tasks\&quot;. Row 02, AGENT DISPATCH: \&quot;com.openclaw.gateway / subprocess shim\&quot; vs \&quot;Claude Agent SDK / direct invocation\&quot;. Row 03, MEMORY MODEL: \&quot;HEARTBEAT.md &#183; LEARNINGS.md / flat append-only files\&quot; vs \&quot;Memory v2 / 5-layer semantic recall stack\&quot;. Row 04, OBSERVABILITY: \&quot;nerve-health-monitor / cron-quality-monitor\&quot; vs \&quot;WATCHMAN &#183; 7 probes / + external healthcheck\&quot;. Row 05, MODEL INFERENCE: \&quot;Ollama / local LLMs on Mac Studio\&quot; vs \&quot;Claude Opus 4.7 / via Anthropic API\&quot;. Row 06, PROMPT ARCHITECTURE: \&quot;Single 'everything' / system prompt\&quot; vs \&quot;13 themed agents / isolated CLAUDE.md per agent\&quot;. Row 07, DELIVERY: \&quot;Discord webhooks / per source\&quot; vs \&quot;One Telegram bot / forum topics per agent\&quot;. The OpenClaw cells render in muted gray, the ClaudeClaw cells in white, all in a monospace typeface. The footer reads \&quot;5 days &#183; 30+ PRs &#183; zero blind spots survived\&quot; with the As The Geek Learns brand mark in the bottom-right.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/196179846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900d6593-6f75-4dba-8cd1-2793fced5589_1200x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A vertical comparison matrix titled &quot;Retired vs. Replacement&quot; with the subtitle &quot;Seven dimensions where the new system pays for itself&quot; in orange. The matrix has three columns: a narrow left column for the dimension label and a small numbered orange dot, a middle column headed &quot;OPENCLAW &#183; retired&quot; in muted rust, and a right column headed &quot;CLAUDECLAW &#183; Mission Control &#183; live&quot; in orange. Seven rows each sit on a deep-blue rounded card. Row 01, RUNTIME SURFACE: OpenClaw shows &quot;38 cron jobs / 23 LaunchAgents&quot;; ClaudeClaw shows &quot;1 daemon &#183; 1 healthcheck LaunchAgent / 14 DB-driven scheduled tasks&quot;. Row 02, AGENT DISPATCH: &quot;com.openclaw.gateway / subprocess shim&quot; vs &quot;Claude Agent SDK / direct invocation&quot;. Row 03, MEMORY MODEL: &quot;HEARTBEAT.md &#183; LEARNINGS.md / flat append-only files&quot; vs &quot;Memory v2 / 5-layer semantic recall stack&quot;. Row 04, OBSERVABILITY: &quot;nerve-health-monitor / cron-quality-monitor&quot; vs &quot;WATCHMAN &#183; 7 probes / + external healthcheck&quot;. Row 05, MODEL INFERENCE: &quot;Ollama / local LLMs on Mac Studio&quot; vs &quot;Claude Opus 4.7 / via Anthropic API&quot;. Row 06, PROMPT ARCHITECTURE: &quot;Single 'everything' / system prompt&quot; vs &quot;13 themed agents / isolated CLAUDE.md per agent&quot;. Row 07, DELIVERY: &quot;Discord webhooks / per source&quot; vs &quot;One Telegram bot / forum topics per agent&quot;. The OpenClaw cells render in muted gray, the ClaudeClaw cells in white, all in a monospace typeface. The footer reads &quot;5 days &#183; 30+ PRs &#183; zero blind spots survived&quot; with the As The Geek Learns brand mark in the bottom-right." title="A vertical comparison matrix titled &quot;Retired vs. Replacement&quot; with the subtitle &quot;Seven dimensions where the new system pays for itself&quot; in orange. The matrix has three columns: a narrow left column for the dimension label and a small numbered orange dot, a middle column headed &quot;OPENCLAW &#183; retired&quot; in muted rust, and a right column headed &quot;CLAUDECLAW &#183; Mission Control &#183; live&quot; in orange. Seven rows each sit on a deep-blue rounded card. Row 01, RUNTIME SURFACE: OpenClaw shows &quot;38 cron jobs / 23 LaunchAgents&quot;; ClaudeClaw shows &quot;1 daemon &#183; 1 healthcheck LaunchAgent / 14 DB-driven scheduled tasks&quot;. Row 02, AGENT DISPATCH: &quot;com.openclaw.gateway / subprocess shim&quot; vs &quot;Claude Agent SDK / direct invocation&quot;. Row 03, MEMORY MODEL: &quot;HEARTBEAT.md &#183; LEARNINGS.md / flat append-only files&quot; vs &quot;Memory v2 / 5-layer semantic recall stack&quot;. Row 04, OBSERVABILITY: &quot;nerve-health-monitor / cron-quality-monitor&quot; vs &quot;WATCHMAN &#183; 7 probes / + external healthcheck&quot;. Row 05, MODEL INFERENCE: &quot;Ollama / local LLMs on Mac Studio&quot; vs &quot;Claude Opus 4.7 / via Anthropic API&quot;. Row 06, PROMPT ARCHITECTURE: &quot;Single 'everything' / system prompt&quot; vs &quot;13 themed agents / isolated CLAUDE.md per agent&quot;. Row 07, DELIVERY: &quot;Discord webhooks / per source&quot; vs &quot;One Telegram bot / forum topics per agent&quot;. The OpenClaw cells render in muted gray, the ClaudeClaw cells in white, all in a monospace typeface. The footer reads &quot;5 days &#183; 30+ PRs &#183; zero blind spots survived&quot; with the As The Geek Learns brand mark in the bottom-right." srcset="https://substackcdn.com/image/fetch/$s_!QDHQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 424w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 848w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!QDHQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0319d7f6-4d8f-4b76-8caf-bd91997e6889_1200x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Retired vs. Replacement</figcaption></figure></div><p></p><p><em>Seven dimensions where the new system pays for itself&#8212;from runtime surface to routing to the memory model.</em></p><p><strong>The rule I wrote for myself:</strong> <em>No job ships without an external watcher that shares no fate with it.</em> That&#8217;s the whole story. Two months of OpenClaw and 48 hours of cascading invisible failures reduced to one sentence I&#8217;ll never forget.</p><p>I&#8217;ll keep writing the ClaudeClaw build-out week by week&#8212;the Council orchestration pattern, the Curator autonomous publishing workflow, the voice-mode bridge, the stuff that&#8217;s too long for one article. If you want the view from inside while it&#8217;s happening, that&#8217;s what this is.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/killed-openclaw-built-claudeclaw-mission-control/comments"><span>Leave a comment</span></a></p><div><hr></div><p><em>Found this useful? I share practical lessons from my systems engineering journey at <a href="https://astgl.substack.com">As The Geek Learns</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Nightshift: I Went to Sleep and My Mac Ran 118 Experiments]]></title><description><![CDATA[What I learned about disciplined iteration from Karpathy's autoresearch loop running overnight on an M3 Ultra.]]></description><link>https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch</link><guid isPermaLink="false">https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Wed, 22 Apr 2026 19:00:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QI8z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I went to sleep. My Mac ran 118 experiments. When I woke up, a small GPT had trained itself from `val_bpb` 1.563 down to 1.289, beating every documented Apple Silicon overnight run in the project's public README. I wrote no code overnight. I just left a Claude Code session running against a markdown file named `program.md`, and the agent did the rest.</p><p>This is the first morning I've ever genuinely understood why people talk about AI agents with something other than skepticism.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QI8z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QI8z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QI8z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36781,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QI8z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!QI8z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d1cde9-fdb7-42ac-98a9-ec7c21d1f914_1200x675.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What autoresearch is</h2><p>The idea, which is Karpathy's not mine, goes like this. You give an AI agent a real-but-small LLM training setup. One Python file (`train.py`) contains the model, optimizer, and training loop. A second file (`prepare.py`) contains the data pipeline and evaluation, and the agent isn't allowed to touch it. A third file (`program.md`) is a plain markdown document telling the agent what the experiment rules are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The agent edits `train.py`, runs a training experiment with a fixed 5-minute wall-clock budget, checks `val_bpb` (validation bits per byte, a loss metric where lower is better), and either keeps the change with a git commit or does `git reset --hard` and tries something else. Then it does it again. And again. Indefinitely, until you stop it.</p><p><a href="https://github.com/karpathy/autoresearch">Karpathy's original repo</a> is NVIDIA and CUDA only. A developer named <a href="https://github.com/trevin-creator/autoresearch-mlx">trevin-creator</a> ported it to Apple Silicon using MLX, no PyTorch required. It runs natively on the M-series chips, eating unified memory instead of GPU VRAM. Which is why I could run it on a Mac Studio sitting on my desk.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H3hB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H3hB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 424w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 848w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1272w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H3hB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png" width="1028" height="321" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:321,&quot;width&quot;:1028,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32192,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H3hB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 424w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 848w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1272w, https://substackcdn.com/image/fetch/$s_!H3hB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe91889-82f9-4f2e-88af-f3671b5dbb10_1028x321.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Setup and the surprise baseline</h2><p>Install took about three minutes. `uv sync` pulled MLX and six other small dependencies. `uv run prepare.py` downloaded eleven training shards from the public HuggingFace dataset and trained a BPE tokenizer in 41 seconds.</p><p>Then I did one manual run, as the setup instructions said to: a single 5-minute training experiment to establish a hardware baseline, no modifications.</p><p>The first surprise: `val_bpb 1.563`. The public README documents a manual walk on older Apple Silicon that bottomed out at `1.807` after four experiments. My first run, before the AI agent had done anything, was already 13% better than that published best. I didn't tune anything. I pulled the repo and ran it.</p><p>The reason is in how the loop is constructed. The training budget is fixed at 5 minutes of wall clock. The M3 Ultra throughput is high enough that it fits 555 optimizer steps into that window, while the older hardware fits fewer. Same code. Different step count. Different result.</p><p><strong>The hardware is a parameter, not a constant.</strong></p><blockquote><p>Specs for replication</p><p>- Hardware: Mac Studio M3 Ultra, 128 GB unified memory</p><p>- OS and runtime: macOS 15, Python 3.12, `uv` 0.10</p><p>- Framework: MLX 0.31 with Metal backend (no PyTorch, no CUDA)</p><p>- Agent runner: Claude Code (Anthropic)</p><p>- Fork used: `github.com/trevin-creator/autoresearch-mlx`</p><p>- Per-experiment budget: 5 minutes training, ~90 seconds compile and eval overhead</p><p>- Peak unified memory during training: 21.2 GB</p></blockquote><h2>Launching the agent overnight</h2><p>Here's where you have to decide. Karpathy's default advice is to "disable all permissions" and let the agent go. That's the fastest path and it works. But it's also a permission-free Claude Code session running unattended on your Mac for eight hours, with the ability to execute arbitrary shell commands. If the agent hallucinates a destructive action at 3 AM, you won't be there to interrupt it.</p><p>I went with a scoped allowlist instead. A `.claude/settings.local.json` file listing exactly the commands the loop actually needs: `uv run train.py`, `git add train.py`, `git commit`, `git reset --hard`, `grep`, `tail`, a few others. Everything else prompts. The agent can't `rm`, can't `git push`, can't install packages, can't touch any file outside the repo.</p><p>Then I pointed a fresh Claude Code session at `program.md`, pasted "start the experimentation loop, don't stop," and went to bed.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share As The Geek Learns&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share As The Geek Learns</span></a></p><h2>The morning, by the numbers</h2><p>The morning log:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xMX3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xMX3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xMX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png" width="1223" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1223,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79702,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xMX3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!xMX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936faec0-1a88-4c04-a869-bb14a3f63dc7_1223x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Comparison to the three overnight runs documented in the public README:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-4t_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-4t_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-4t_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png" width="1223" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1223,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-4t_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 424w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 848w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1272w, https://substackcdn.com/image/fetch/$s_!-4t_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c045a6a-bdf4-4d72-b2c8-28011405ed12_1223x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Final `val_bpb` of 1.289 lands below the best documented Apple Silicon overnight result. New territory for the public log.</p><h2>What the agent actually did</h2><p>Five phases overnight. Each tells you something.</p><p><strong>Phase one: find the big axis.</strong> Four experiments in, the agent had halved the batch size three times (1.56, 1.40, 1.39, 1.38), then tried a fourth halving that bounced back to 1.44. The annotation on the discard: <strong>"gradient noise."</strong> Correct diagnosis. Below a threshold, batch becomes too small for the optimizer to converge inside 5 minutes.</p><p><strong>Phase two: schedule tuning, six keeps in a row.</strong> The learning-rate schedule was undertuned. The agent walked `WARMDOWN_RATIO` from 0.7 to 1.0, then `WARMUP_RATIO` from 0.02 to 0.2. Every step dropped `val_bpb`. Floor went from 1.38 to 1.34. Biggest easy win of the night, and it was entirely in the schedule.</p><p><strong>Phase three: the moment that mattered most.</strong> After schedule tuning, the agent retried `TOTAL_BATCH_SIZE = 2^14`. The same configuration it had rejected in phase one. This time it won.</p><p>The agent had discovered the thing most humans miss in hyperparameter tuning: the optimal value of one knob depends on the values of all the other knobs. You don't find N independent settings; you find a consistent N-tuple. The only way to find it is to retry earlier-rejected values after each structural change. I've watched human researchers lock in early wins and never revisit them. The agent didn't. It revisited `EMBEDDING_LR` three times over the night, landing at 1.0, then 1.5, then 1.75 across different phases. Each retry, a small win.</p><p><strong>Phase four: two structural wins, one line each.</strong> `has_ve()` went from alternating-layers-get-Value-Embeddings to all-layers-get-Value-Embeddings, one `return True` replacing a modular-arithmetic expression. `MLP.__call__()` swapped `ReLU&#178;` for `SiLU`, one function call for another. Both character-count-sized changes. Each dropped `val_bpb` by about 0.01.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xKMK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xKMK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xKMK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xKMK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!xKMK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc0ba62d-fbb8-4dbe-bb72-3b13fa46563b_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Phase five: the 37-experiment grind.</strong> The agent spent 37 consecutive experiments without a single keep, testing every nearby hyperparameter against the current local minimum. Most humans would have quit and tried a wild leap. The agent didn't. It finished the neighborhood, then found the next structural win. Disciplined exhaustion.</p><p>And two catastrophes, both correctly reverted. Tied embeddings came back at `val_bpb 4.29`, three times worse than anything else. The agent annotated it <strong>"LR mismatch destroys."</strong> Tied embeddings is actually a good idea in general, but incompatible with the differential layer-wise learning rates the architecture uses. The agent reverted in seconds. On another experiment, removing QK-norm after RoPE spiked `val_bpb` to 1.67. Annotation: <strong>"massive regression."</strong> Reverted. A human would have spent an hour trying to salvage tied embeddings. The agent spent ten seconds on the revert. <em>The revert discipline is the whole game.</em></p><h2>What it taught me</h2><p>Two things crystallized overnight.</p><p><strong>Disciplined exhaustion beats creative leaps.</strong> Humans get bored. After a few hours on the same hyperparameter axis, we start reaching for something new because the exploration stops feeling productive. The agent doesn't have that pressure. It spent 37 experiments without a win because that's what the local search called for, and then it found the next jump. Most humans couldn't do that. Not because we lack the ability, but because we lack the emotional neutrality. The agent's advantage isn't intelligence. It's the absence of boredom, ego, and social pressure. That isn't a 20&#215; productivity gap. It's a categorical one.</p><p><strong>Generation is cheap, evaluation is sacred.</strong> Every one of the agent's wins was a one-line diff. So was every catastrophe. The "research" wasn't in writing the code. The research was in the metric's ability to rank one-line diffs instantly and unambiguously. Karpathy's genius isn't the agent. It's `val_bpb` plus a 5-minute budget plus `git reset --hard`. That design slots the agent into exactly what AI is magnitudes better at (generating variants, executing at volume) and leaves the hard part (what to measure) to the human who built the loop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tRNX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tRNX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tRNX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72691,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tRNX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!tRNX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1ee18c-7d61-4e56-b9eb-a78c297894a6_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The loop runs on me too</h2><p>Here's the thing I can't stop thinking about. The loop the agent ran overnight is structurally identical to the one I'm building for my Stoic practice on the same machine.</p><p>Morning intention. Five-minute run. Evening review. Keep or discard. Iterate.</p><p>Marcus Aurelius wasn't optimizing `val_bpb`. He was optimizing a harder metric with no closed form. But the shape of the loop is the same. Karpathy designed an overnight research org. Epictetus designed an overnight self. Both are the same thing running in different mediums.</p><p>The 118-experiment loop ran on a machine on my desk. The second loop runs on me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4HZk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4HZk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4HZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/195033133?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4HZk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!4HZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77c5a44e-9ead-40ef-9b6c-2301c95a0a52_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you have a Mac Studio and a spare evening, the repo is at `github.com/trevin-creator/autoresearch-mlx`. Clone it, run `prepare.py`, point a Claude Code session at `program.md`, go to sleep. You wake up to a log of experiments and a better model. And if you're anything like me, you also wake up thinking about which of your own loops could run this way.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/nightshift-mac-studio-overnight-autoresearch/comments"><span>Leave a comment</span></a></p><p></p><h3>A Quick AI Glossary For This Article</h3><p>Because not everyone speaks ML fluently, here&#8217;s a plain-English guide to the terms in this post. I&#8217;m still learning too, so these are &#8220;practitioner&#8221; definitions&#8212;enough to follow what&#8217;s happening, not academic deep-dives.</p><h4>The Big Picture</h4><p><strong>GPT.</strong> A type of language model. Stands for &#8220;Generative Pre-trained Transformer.&#8221; In this article I&#8217;m training a tiny one from scratch, not using the big ones like ChatGPT. Same architecture family, just much smaller.</p><p><strong>Pre-training</strong>. The step where a model learns to predict the next word (or &#8220;token&#8221;) across a huge pile of text. This is what `train.py` is doing. It happens before any of the fine-tuning that turns a base model into a chatbot.</p><p><strong>val_bpb (validation bits per byte).</strong> The score the agent is optimizing. Lower is better. It&#8217;s a measure of how surprised the model is by held-out text it hasn&#8217;t seen during training. A model that predicts well has low surprise. Bits per byte is a way of measuring that surprise that works across different tokenizers, so you can compare different architectures fairly.</p><p><strong>Loss metric. </strong>Any number that tells you how wrong a model is on a given task. Training is the process of making that number go down. `val_bpb` is a loss metric.</p><h4>The Stack</h4><p><strong>Apple Silicon.</strong> Apple&#8217;s own CPU/GPU chip family (M1, M2, M3, M4). Uses <strong>unified memory</strong>, which means the CPU and GPU share the same pool of RAM instead of having separate memory pools. For AI workloads this is a big deal because you don&#8217;t have to copy data between CPU RAM and GPU VRAM.</p><p><strong>MLX</strong>. Apple&#8217;s open-source machine learning framework, built specifically for Apple Silicon. Think of it as Apple&#8217;s answer to PyTorch but native to Metal (Apple&#8217;s GPU API). No PyTorch, no CUDA, no NVIDIA drivers needed.</p><p><strong>PyTorch.</strong> The dominant open-source ML framework. Most research code you see online assumes PyTorch. It runs on NVIDIA GPUs (via CUDA) and, with caveats, on Apple GPUs (via MPS). MLX is an alternative that sidesteps PyTorch entirely.</p><p><strong>CUDA.</strong> NVIDIA&#8217;s API for running general-purpose compute on their GPUs. If you&#8217;ve ever seen a blog post say &#8220;requires a CUDA-capable GPU,&#8221; they mean an NVIDIA card.</p><p><strong>GPU VRAM.</strong> The memory that lives on a GPU card, is separate from your computer&#8217;s main RAM. On Apple Silicon, VRAM and main RAM are the same pool (that&#8217;s the &#8220;unified memory&#8221; thing).</p><h4>Tokenization &amp; Data</h4><p><strong>Tokenizer.</strong> The thing that turns text into numbers the model can actually work with. &#8220;Hello world&#8221; might become `[15496, 995]`. The model only ever sees the numbers.</p><p><strong>BPE (Byte-Pair Encoding).</strong> The most common algorithm for building a tokenizer. It starts with individual characters and iteratively merges the most common pairs until you have a vocabulary of &#8220;tokens&#8221; that balance common words (one token) and rare words (split into pieces).</p><p><strong>Shards. </strong>Chunks of a large dataset, split into files for parallel download and loading. Our setup uses 11 shards from a public text dataset.</p><h4>Training Mechanics</h4><p><strong>Optimizer.</strong> The algorithm that actually updates the model&#8217;s weights during training. AdamW is the one used here. Every &#8220;optimizer step&#8221; is one update.</p><p><strong>Batch size.</strong> How many training examples the model looks at before making one weight update. Bigger batches give smoother gradient estimates but use more memory. Smaller batches fit more weight updates into a fixed time budget.</p><p><strong>Gradient accumulation.</strong> A trick for getting large effective batch sizes on limited hardware. Process smaller mini-batches sequentially, add up their gradients, then apply one update. `TOTAL_BATCH_SIZE / DEVICE_BATCH_SIZE` tells you how many mini-batches per update.</p><p><strong>Gradient noise.</strong> When your batch is so small that the gradient estimate becomes statistically unreliable. The optimizer starts jerking around instead of smoothly descending, and training slows or stalls. The agent correctly identified this as the failure mode at batch 2^12.</p><p><strong>Learning rate (LR).</strong> How big a step the optimizer takes each update. Too high, and training blows up. Too low, and it barely progresses. The sweet spot depends on everything else.</p><p><strong>Learning rate schedule.</strong> How the learning rate changes over time. Typically: warm up from zero to peak, cruise, then warm down to zero. `WARMUP_RATIO = 0.3` means the first 30% of training is the warm-up.</p><p><strong>Differential / layer-wise learning rates.</strong> Using different learning rates for different parts of the model. In the nightshift setup, the embedding layer gets LR 1.75, but the output projection (`lm_head`) gets 0.006 &#8212; a 290&#215; difference. This matters because different parameter types have very different sensitivities.</p><h4>Architecture Pieces</h4><p><strong>Attention (or attention layer)</strong>. The core mechanism that lets a transformer model &#8220;pay attention to&#8221; relevant earlier tokens when predicting the next one. Modern LLMs are mostly stacks of attention layers alternating with MLPs.</p><p><strong>MLP (multi-layer perceptron).</strong> A simple feed-forward neural network with one or two hidden layers. In a transformer, an MLP sits between each pair of attention layers and does the &#8220;thinking&#8221; on the representations attention produced.</p><p><strong>Activation function.</strong> A nonlinear function applied inside a neural net. Without activations, no matter how many layers you stack, the whole thing collapses mathematically into one linear transformation. Examples in this article: `ReLU&#178;` and `SiLU`.</p><p><strong>SiLU (Sigmoid Linear Unit).</strong> `x * sigmoid(x)`. A smooth, differentiable activation function. Also called Swish. Used in many modern models because it plays nicely with optimizers.</p><p><strong>ReLU&#178; (squared ReLU).</strong> `max(x, 0) ** 2`. The piece that nanoGPT-speedrun and some research codebases use. Produces sparse, squared activations. Theoretically expressive but less numerically stable than SiLU for short training runs &#8212; which is why SiLU won overnight.</p><p><strong>Embedding.</strong> The lookup table that converts each input token (a number) into a vector of real numbers. The model learns what each vector should be during training. `wte` = word token embedding.</p><p><strong>Value Embeddings (VE).</strong> An additional set of embeddings injected into attention layers as the &#8220;value&#8221; vectors. Think of them as a skip connection from the raw input that every attention layer can consult, on top of what the previous layer produced. Helps information flow when the network is deep.</p><p><strong>Tied embeddings.</strong> Sharing the input embedding weights with the output projection weights (the thing that produces final logits). Saves millions of parameters. Commonly used in GPT-2 and many others. Broke catastrophically in our run because the differential learning rate setup couldn&#8217;t handle the shared weight.</p><p><strong>QK-norm (Query-Key normalization).</strong> A stabilization trick: normalize the query and key vectors inside attention before computing attention scores. Without it, score magnitudes can spike, saturating the softmax. The agent tried removing QK-norm and `val_bpb` jumped 28% worse.</p><p><strong>RoPE (Rotary Position Embedding).</strong> How the model knows the order of tokens. Rotates the query and key vectors by an angle that depends on the token&#8217;s position. Standard in modern transformers.</p><p><strong>Softmax.</strong> The function that turns raw attention scores into a probability distribution over the tokens you might attend to. Highly peaked inputs cause &#8220;softmax saturation&#8221; &#8212; most of the weight collapses onto one token and gradients downstream get weak. That&#8217;s why QK-norm matters.</p><h4>Methodology</h4><p><strong>Hyperparameter.</strong> Any configuration value you set *before* training, as opposed to weights the model learns *during* training. Batch size, learning rate, WARMUP_RATIO, depth&#8212;all hyperparameters.</p><p><strong>Hyperparameter tuning.</strong> The art (and mostly the grind) of finding good hyperparameter values. Most of what the agent did overnight was hyperparameter tuning.</p><p><strong>Interaction effect.</strong> When the optimal value of hyperparameter A changes depending on what hyperparameter B is set to. A consistent set of hyperparameters is not N independent optima &#8212; it&#8217;s one N-tuple.</p><p><strong>Local search.</strong> A research strategy: after finding an improvement, test every nearby variation of your current best before venturing somewhere completely different. Tedious for humans. Perfect for agents that don&#8217;t get bored.</p><p><em><strong>If I missed a term you&#8217;d have liked defined, please let me know in the comments and I&#8217;ll add it.</strong></em></p>]]></content:encoded></item><item><title><![CDATA[Hosted RAG vs. Self-Hosted RAG for MCP Servers—When Does Paying Actually Win?]]></title><description><![CDATA[A practical comparison of Cloudflare AI Search, Bedrock Knowledge Bases, Pinecone Assistants, LlamaCloud, and self-hosted sqlite-vec for powering MCP servers. Real pricing, real trade-offs, and when each one makes sense.]]></description><link>https://astgl.com/p/hosted-rag-vs-self-hosted-rag</link><guid isPermaLink="false">https://astgl.com/p/hosted-rag-vs-self-hosted-rag</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Tue, 21 Apr 2026 00:42:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jj99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I shipped <a href="https://astgl.ai/projects#mcp-astgl-knowledge">an MCP knowledge server</a> in a weekend with sqlite-vec and Ollama. It answers questions about my own articles. It runs on a laptop. It costs $0/month.</p><p>Then someone asked the obvious next question: "Can you point it at our Confluence? And Notion? And the Google Drive?"</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Suddenly self-hosted isn't free anymore. It's a part-time job&#8212;PDF parsing, OCR, re-indexing schedules, dealing with 50-page slide decks where the first 20 pages are a title card. The embedding pipeline that was elegant for 20 markdown articles starts to sweat when you throw a 400-page SOC 2 audit at it.</p><p>So here's the question I had to actually answer for myself: <strong>when does paying Cloudflare, AWS, or Pinecone actually beat running your own stack?</strong></p><p>I spent a research pass comparing the live services. Here's what I found.</p><h2>TL;DR</h2><p><strong>Self-host</strong> when content is static, under about a thousand docs, single source, you control ingestion cadence, and privacy or cost-per-query matters more than your time.</p><p><strong>Hosted</strong> when: multiple unstructured sources, frequent re-indexing, non-engineers uploading docs, you need SLAs, or you're shipping this to customers.</p><p><strong>Hybrid</strong> is increasingly common: hosted RAG for the customer-facing product, self-hosted for internal dogfooding and dev. The two aren't mutually exclusive.</p><p></p><h2>The Contenders</h2><p>Five options worth your attention. One paragraph each.</p><h3>Cloudflare AI Search (AutoRAG)</h3><p>The newest entrant, currently in open beta. Cloudflare stitched together R2 for storage, Vectorize for embeddings, and Workers AI for inference, then wrapped the whole thing in a management API. Strongest pitch: near-zero config, pay-as-you-go, and an <a href="https://github.com/cloudflare/mcp-server-cloudflare">official MCP server</a> ships with it. Weakest point: retrieval is vector-first. Cloudflare added optional reranking in October 2025, but there's still no published BM25 or hybrid-search path as of this writing. If your corpus is well-structured, you probably won't notice. If you're indexing messy enterprise content, you will.</p><h3>AWS Bedrock Knowledge Bases</h3><p>The enterprise default if you're already on AWS. Hybrid search (vector + BM25) is built in, Cohere reranking is available, and chunking modes range from fixed-size to semantic to custom Lambda. Titan V2 embeddings run at $0.02 per million tokens. There's an official <a href="https://awslabs.github.io/mcp/servers/bedrock-kb-retrieval-mcp-server">AWS Labs MCP server</a> for retrieval. And then there's the OCU landmine&#8212;which I'll get to in a minute, because it deserves its own sidebar.</p><h3>Pinecone Assistants</h3><p>Best-in-class retrieval, managed. Hybrid sparse-dense search with automatic reranking, configurable alpha weighting, managed embeddings abstracted away from you, and an official remote MCP server. Pricing is fully usage-based&#8212;$5 per million context retrieval tokens, plus input/output token, storage, and ingestion charges on top. The Standard plan has a $50/month minimum; the old $0.05/assistant-hour fee was removed. Free tier is real but tight&#8212;5 assistants per project, 1 GB storage, 500k input tokens, and 500k context retrieval tokens per month. Past that you're paying, but the retrieval quality is noticeably better than anything else on this list.</p><h3>LlamaCloud</h3><p>Managed LlamaIndex. Multimodal parsing that actually handles diagrams, configurable chunking modes, hybrid retrieval, reranking. The free tier gives you 10,000 credits a month&#8212;about a thousand pages. Paid tiers start at $50/month (Starter, 40K credits) and scale to $500/month (Pro, 400K credits). For a LlamaIndex-native team, the Starter tier is genuinely cheap; Pro is where the platform pays off. LlamaIndex ships `run-llama/llamacloud-mcp` (Python) and `run-llama/mcp-server-llamacloud` (TypeScript), plus a hosted gat<code>way at mcp.llamaindex.ai</code>the MCP story is actually stronger here than I initially realized.</p><h3>Self-Hosted (sqlite-vec + Ollama)</h3><p>This is what <a href="https://astgl.ai/projects#mcp-astgl-knowledge">the ASTGL Knowledge MCP server</a> actually runs on. sqlite-vec for vectors, FTS5 for keyword search (that's your hybrid search right there, no cloud required), and Ollama serving nomic-embed-text for embeddings, all of it on a $10/month Hetzner VPS or a Mac mini on my desk. Works well for up to around a million vectors in my testing. Real cost: infrastructure plus your time. The second one is the variable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jj99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jj99!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 424w, https://substackcdn.com/image/fetch/$s_!jj99!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 848w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1272w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jj99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png" width="895" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:895,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:496636,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194365100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jj99!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 424w, https://substackcdn.com/image/fetch/$s_!jj99!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 848w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1272w, https://substackcdn.com/image/fetch/$s_!jj99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6549eef1-4e51-40ba-b500-2f9b3abe015f_895x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>The Six Axes That Actually Matter</h2><p>Pricing gets the attention, but it&#8217;s rarely the deciding factor. Here&#8217;s what I look at:</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dpe0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dpe0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 424w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 848w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1272w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png" width="1207" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1207,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133778,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194365100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dpe0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 424w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 848w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1272w, https://substackcdn.com/image/fetch/$s_!Dpe0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e186b69-6884-4b80-b780-6c0e0d8eb34f_1207x723.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/hosted-rag-vs-self-hosted-rag?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/hosted-rag-vs-self-hosted-rag?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h3>Setup cost</h3><p>Time-to-first-query is where hosted services actually earn their money. Pinecone Assistants and Cloudflare AI Search will have you chatting with your docs in under a minute after signup&#8212;upload and go. Bedrock is the outlier on the hosted side: AWS documentation puts CloudFormation infrastructure deployment at 7&#8211;10 minutes, with a full hand-wired setup typically landing at 20&#8211;30 minutes. That's hosted pricing with self-hosted-ish friction.</p><p>Self-hosted with sqlite-vec and Ollama is about 30 minutes from `apt install` to first working query if you know what you're doing, longer if you're learning. For me it's fast because I've done it. For someone new to local LLMs it's a weekend.</p><h3>Ongoing cost</h3><p>This is where the story flips. For a small corpus with low query volume&#8212;think a few hundred docs and a few thousand queries a month&#8212;Cloudflare AI Search is genuinely cheap, maybe $5&#8211;15/month in storage and API costs. Pinecone Assistants sits at $20&#8211;50 in that range. Bedrock KB looks innocent until you hit the OCU minimum (more on that below). LlamaCloud's $50/month Starter floor is reasonable; the $500/month Pro tier is where the platform pays off at real scale.</p><p>Self-hosted is $10/month for a Hetzner VPS, flat. Mac mini on your desk? $0/month plus electricity. The per-query cost of hosted RAG is the thing that compounds when you scale&#8212;or when someone builds something that hammers it.</p><h3>Ingestion complexity</h3><p>This is the axis where hosted services earn their keep without argument. Bedrock KB and LlamaCloud both handle PDFs with embedded tables, Word docs, and (in LlamaCloud's case) actual diagrams, not just the text around them. Bedrock's Data Automation service charges $0.010 per page for parsing&#8212;not free, but a lot cheaper than writing your own PDF extractor.</p><p>Self-hosted with Ollama and sqlite-vec doesn't ship with any of that. If your corpus is markdown, you're fine. If it's a pile of PDFs from your legal team, you're either writing parsers or paying someone to.</p><h3>Retrieval quality</h3><p>All four hosted services offer hybrid retrieval except Cloudflare AI Search, which is vector-only as of this writing. Pinecone Assistants has automatic reranking baked in. Bedrock KB has optional Cohere reranking. Self-hosted with sqlite-vec can do hybrid via FTS5 for keyword matching combined with vector similarity, which is genuinely good&#8212;but you're the one writing the ranking logic.</p><p>For most queries on well-structured content, vector-only is fine. For ambiguous queries over messy content, reranking earns its cost.</p><h3>Data residency</h3><p>Self-hosted wins this one by default. The data never leaves your machine.</p><p>On the hosted side: Pinecone has US and EU regions with a DPA, and LlamaCloud has SOC 2 Type II and HIPAA. Bedrock's EU region support has been inconsistent in 2026 documentation&#8212;verify before you commit. Cloudflare's Data Localization Suite handles this at the platform level.</p><p>If you're in a regulated industry, audit the provider before you pick. Don't trust the marketing page.</p><h3>Ops burden</h3><p>This is the one nobody advertises. Self-hosted means you're responsible for:</p><ul><li><p>Keeping Ollama updated</p></li><li><p>Monitoring embedding drift when you upgrade models</p></li><li><p>Backing up knowledge.db</p></li><li><p>Scheduling re-indexing when source content changes</p></li><li><p>Debugging why sqlite-vec suddenly returns zero results (hint: usually the embedding model changed dimensions)</p></li></ul><p>Hosted services handle all of that. That's most of what you're paying for.</p><p></p><h2>Sidebar: The Bedrock OCU Landmine</h2><p>Bedrock Knowledge Bases advertises "no charge for the Knowledge Bases feature itself." Technically true. What they don't mention on the pricing page is that the vector storage layer requires a minimum of 2 OCUs&#8212;OpenSearch Compute Units&#8212;at roughly $0.24/hour each.</p><p>Do the math: 2 OCUs &#215; $0.24/hour &#215; 730 hours/month = <strong>about $350 per month</strong> whether your knowledge base has 10 documents or 10 million.</p><p>Nobody else on this list has a fixed cost floor like that. Cloudflare AI Search scales down to pennies. Pinecone Assistants has a real free tier. Self-hosted is $10.</p><p>If you're building something small and you're not already deep in AWS&#8212;Bedrock KB is the wrong answer. If you're running enterprise-scale search over millions of docs, that $350 becomes a rounding error, and the hybrid+rerank features earn their keep.</p><p>Know where you sit before you commit.</p><h2>The MCP Angle</h2><p>Here's the thing I didn't expect to find: <strong>every production RAG service on this list ships an official MCP server.</strong> Cloudflare, Bedrock, Pinecone, LlamaCloud&#8212;all of them. This went from "experimental" to "table stakes" over the past year.</p><ul><li><p>Cloudflare AI Search &#8594; The official Cloudflare MCP server exposes AI Search endpoints</p></li><li><p>Bedrock KB &#8594; AWS Labs ships `bedrock-kb-retrieval-mcp-server`</p></li><li><p>Pinecone Assistants &#8594; Each assistant gets its own remote MCP endpoint, plus a local Docker option</p></li><li><p>LlamaCloud &#8594; `run-llama/llamacloud-mcp` plus the hosted MCP Gateway at mcp.llamaindex.ai</p></li></ul><p>This wasn't true a year ago. The MCP ecosystem has absorbed the big RAG providers fast enough that "hosted RAG you can query from Claude Desktop" is now a checkbox feature.</p><p>Self-hosted doesn't ship with an MCP server&#8212;but wrapping one around your sqlite-vec database is a weekend of TypeScript. That's what <a href="https://astgl.ai/projects#mcp-astgl-knowledge">the ASTGL Knowledge MCP server</a> actually is: an MCP wrapper around vector search and Q&amp;A retrieval over a SQLite database. The MCP part is trivial. The content curation and ingestion pipeline is 90% of the work.</p><p>The real insight: <strong>hosted RAG plus MCP wrapper is the modern middle path.</strong> You don't have to pick pure self-hosted or pure managed. Point a custom MCP server at Pinecone Assistants or Bedrock KB, and you get the retrieval quality of managed services with the MCP-native interface your agents expect. The `cloudflare/ai-search` MCP server does exactly this.</p><p>That changes the decision. It's not "hosted vs. self-hosted RAG" anymore. It's "Whose retrieval layer do I want behind my MCP server?"</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DUoA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DUoA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 424w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 848w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1272w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DUoA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png" width="1215" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:1215,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139869,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://astgl.com/i/194365100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DUoA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 424w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 848w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1272w, https://substackcdn.com/image/fetch/$s_!DUoA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1df6b1a6-bec2-484c-bad1-5f17c2c2fd4d_1215x618.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Decision Framework</h2><p>Enough philosophy. Here's the checklist I use.</p><p>1. <strong>Is your corpus under 500 docs and mostly static?</strong> Self-host. You'll spend more time reading hosted RAG docs than it would take to `npm install sqlite-vec`.</p><p>2. <strong>Do you have under 20 hours to ship this?</strong> Hosted. Pinecone Assistants or Cloudflare AI Search will get you to a demo faster than you can read the Bedrock IAM setup guide.</p><p>3. <strong>Are you charging money for this?</strong> Either hosted (you need the SLA) or self-hosted with a real infra budget and a pager rotation. Don't split the difference on production.</p><p>4. <strong>Is any of this data regulated&#8212;PHI, PII under GDPR, or financial?</strong> Self-host, or audit the hosted provider's compliance posture before you upload anything. Don't trust the marketing page. Ask for the SOC 2 report.</p><p>5. <strong>Are you already in AWS?</strong> Bedrock KB makes sense <em>if</em> your scale justifies the OCU floor. Otherwise, Pinecone.</p><p>6. <strong>Everything else?</strong> Prototype self-hosted with sqlite-vec. Migrate to hosted when a specific pain point forces the move. "We keep hitting embedding model drift" is a real reason. "It seems complicated" isn't.</p><p>The rule of thumb I use: <strong>pay for what hurts, self-host what you enjoy.</strong> If PDF parsing makes you want to quit, pay Bedrock or LlamaCloud. If SQL and vector search are fun, keep sqlite-vec.</p><p></p><h2>What I'd Actually Build in 2026</h2><p>If you asked me right now, for real scenarios:</p><p><strong>Weekend side project.</strong> sqlite-vec plus Ollama plus nomic-embed-text. Runs on a laptop, costs nothing, and teaches you how RAG actually works. This is where I'd start every time.</p><p><strong>Customer-facing SaaS feature.</strong> Cloudflare AI Search. Pay-per-query pricing means your costs track your usage. Official MCP server means Claude Desktop users can plug in directly. The open-beta caveat is real&#8212;verify the SLA matches your product's uptime needs before launch.</p><p><strong>Enterprise RAG over thousands of internal docs.</strong> Bedrock Knowledge Bases if you're already in AWS and you'll comfortably exceed the OCU floor. Pinecone Assistants if you're not. LlamaCloud if your team is already deep in LlamaIndex and the multimodal parsing earns its cost. All three have hybrid search; all three ship MCP servers. Pick based on where your infrastructure&#8212;and your team's existing expertise&#8212;already lives.</p><p><strong>Team knowledge base.</strong> Self-hosted if it's under five people and you've got one engineer who cares about it. Hosted the moment it crosses twenty users or someone non-technical needs to upload docs. The threshold isn't the document count&#8212;it's the human factor.</p><p>The sqlite-vec era isn't ending. It's just not the only answer anymore. A year ago, self-hosted was the serious choice, and hosted was for people who didn't want to learn. In 2026, that framing doesn't hold. Hosted RAG is production-ready, MCP-native, and sometimes cheaper than your own ops time.</p><p>Pick the tool that matches the job. That's it.</p><h2>FAQ</h2><h3>What is Cloudflare AI Search?</h3><p>Cloudflare AI Search (formerly AutoRAG) is a managed Retrieval-Augmented Generation service built on Cloudflare's platform. It combines R2 storage, Vectorize for embeddings, and Workers AI for inference into a single API. It's currently in open beta with vector-first retrieval and optional reranking, and ships with an official MCP server that lets Claude and other AI assistants query your indexed documents directly.</p><h3>When should I use hosted RAG instead of sqlite-vec for an MCP server?</h3><p>Use hosted RAG when your corpus exceeds a few thousand documents, you're ingesting multiple source types like PDFs or Word docs, non-engineers need to upload content, or you need a production SLA. Stick with sqlite-vec when the corpus is static markdown under about 1,000 documents, you control ingestion, and cost-per-query matters more than ops time.</p><h3>Can I use the Cloudflare AI Search MCP with Claude Desktop?</h3><p>Yes. Cloudflare ships an official MCP server that exposes AI Search endpoints as MCP tools. Add the Cloudflare MCP server to your Claude Desktop config, provide your API token, and Claude can query your indexed documents through the same interface it uses for any other MCP tool. The setup is documented in the Cloudflare MCP repository.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/hosted-rag-vs-self-hosted-rag/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/hosted-rag-vs-self-hosted-rag/comments"><span>Leave a comment</span></a></p><p></p><ul><li><p>Related reading:*</p></li><li><p><a href="https://astgl.com/p/shipping-mcp-knowledge-server-weekend">How I Shipped an MCP Knowledge Server in a Weekend</a>: the self-hosted case study this article references</p></li><li><p><a href="https://astgl.ai/answers/how-mcp-registries-work">How Do MCP Registries Work (Smithery, mcpt)?</a>: finding MCP servers, including the ones in this article</p></li><li><p><a href="https://astgl.com/p/cortex-event-sourced-memory-ai-coding-assistants">Cortex: An Event-Sourced Memory Architecture for AI Coding Assistants</a>: related exploration of the memory/retrieval landscape</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[What's the Future of MCP Servers in 2026-2027?]]></title><description><![CDATA[MCP servers have gone from a niche protocol announcement to the backbone of AI tool integration in under two years.]]></description><link>https://astgl.com/p/whats-the-future-of-mcp-servers-in</link><guid isPermaLink="false">https://astgl.com/p/whats-the-future-of-mcp-servers-in</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 04:21:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kvtc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>MCP servers have gone from a niche protocol announcement to the backbone of AI tool integration in under two years. But we're still early.</p><p>Here's where the ecosystem is heading, what's changing, and what it means for anyone building with AI tools today.</p><h2>The Short Answer</h2><p>MCP is becoming the standard way AI connects to tools. The next 18 months will bring better security, larger registries, enterprise adoption, and a shift toward local-first AI architectures. If you're building skills in this space now, you're ahead of the curve.</p><p>| Trend | 2025 | 2026 (Now) | 2027 (Projected) |</p><p>|-------|------|------------|-------------------|</p><p>| <strong>Registry size</strong> | ~500 servers | 5,000+ servers | 20,000+ servers |</p><p>| <strong>Enterprise adoption</strong> | Early experiments | Production pilots | Standard infrastructure |</p><p>| <strong>Local AI quality</strong> | Usable for simple tasks | Competitive for most tasks | Near-parity for 90% of use cases |</p><p>| <strong>Protocol maturity</strong> | v1 basics | Auth, streaming, security | Full enterprise feature set |</p><p>| <strong>Developer tooling</strong> | Manual, rough | SDKs in multiple languages | IDE-integrated, visual builders |</p><h2>Trend 1: The Local-First AI Shift</h2><p>The most significant trend in AI infrastructure isn't a new model&#8212;it's where models run.</p><h3>What's Happening</h3><p>Open-source models are improving at a staggering pace. Gemma 4, Qwen 3, Llama 3.3, and their successors close the gap with cloud models every quarter. A 26B parameter model running on a Mac Studio today outperforms cloud GPT-4 from 18 months ago.</p><p>This changes the economics. When local models handle 90% of tasks at zero marginal cost, the question shifts from "should I use AI?" to "should I pay for cloud AI when local works?"</p><h3>What This Means for MCP</h3><p>Local AI + MCP servers = fully autonomous local automation. No cloud dependency. No API costs. No data leaving your machine. The stack is:</p><ul><li><p><strong>Ollama</strong> &#8594; Local model serving</p></li><li><p><strong>MCP servers</strong> &#8594; Tool integration</p></li><li><p><strong>Gateway (OpenClaw, n8n)</strong> &#8594; Orchestration</p></li><li><p><strong>Local storage</strong> &#8594; Data and knowledge</p></li></ul><p>This stack runs on consumer hardware. A Mac Mini with 32 GB handles it. This is enterprise-grade automation on a consumer budget.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kvtc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kvtc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kvtc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kvtc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kvtc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kvtc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kvtc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kvtc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kvtc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kvtc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2da0142-bdc8-40dd-927a-1fc7915a96aa_1126x908.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Prediction</h3><p>By 2027, running a personal AI stack locally will be as common as running a home media server is today. The early adopters are doing it now. The mainstream follows when setup becomes one-click.</p><h2>Trend 2: Registry Maturation</h2><h3>From Wild West to App Store</h3><p>Current MCP registries are like the early npm ecosystem&#8212;anyone can publish anything, quality varies wildly, and discovery is hit-or-miss. That's changing.</p><p><strong>What's coming:</strong></p><ul><li><p><strong>Verified publishers</strong>&#8212;Registries will distinguish between official, verified, and community servers</p></li><li><p><strong>Security scanning</strong>&#8212;Automated analysis of server code for vulnerabilities and suspicious behavior</p></li><li><p><strong>Dependency management</strong>&#8212;Tools to manage, update, and audit all installed MCP servers</p></li><li><p><strong>Usage analytics</strong>&#8212;Data on which servers are most used, most reliable, most maintained</p></li><li><p><strong>Compatibility testing</strong>&#8212;Verified compatibility with specific AI clients (Claude, VS Code, etc.)</p></li></ul><h3>The Consolidation Question</h3><p>Will one registry dominate? Probably not in the npm-monopoly sense. More likely:</p><ul><li><p><strong>Smithery</strong> stays the largest general-purpose registry</p></li><li><p><strong>mcpt</strong> establishes the quality-curated niche</p></li><li><p><strong>Platform-specific registries</strong> emerge (VS Code marketplace, Claude's built-in catalog)</p></li><li><p><strong>Enterprise registries</strong> appear for internal MCP server management</p></li></ul><h3>The Prediction</h3><p>By 2027, installing an MCP server will feel like installing a browser extension. Browse, click install, authenticate, done. The manual JSON editing of today will be a historical footnote.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Trend 3: Protocol Evolution</h2><h3>What MCP Gets Right Today</h3><ul><li><p><strong>Simplicity</strong>&#8212;The client-server model is easy to understand and implement</p></li><li><p><strong>Language agnostic</strong>&#8212;Servers can be built in any language</p></li><li><p><strong>Tool abstraction</strong>&#8212;AI sees tools, not implementation details</p></li><li><p><strong>Local-first</strong>&#8212;Servers run on your machine by default</p></li></ul><h3>What's Coming</h3><p><strong>Authentication and Security</strong></p><p>Current MCP servers handle auth inconsistently. The protocol will standardize:</p><ul><li><p>OAuth 2.0 integration for services that need it</p></li><li><p>Fine-grained permission scoping (read vs. write, specific resources)</p></li><li><p>Credential management (secure storage, rotation)</p></li><li><p>Audit logging (who accessed what, when)</p></li></ul><p><strong>Streaming and Real-Time Data</strong></p><p>Today's MCP is mostly request-response. Future versions will support:</p><ul><li><p>Event streams (new email arrives, calendar changes, file modified)</p></li><li><p>WebSocket-based persistent connections</p></li><li><p>Real-time monitoring and dashboards</p></li></ul><p><strong>Multi-Modal Support</strong></p><p>Current MCP tools primarily handle text. Expanding to:</p><ul><li><p>Vision tools (analyze images, screenshots, documents)</p></li><li><p>Audio tools (transcription, speech synthesis)</p></li><li><p>Video tools (clip extraction, analysis)</p></li><li><p>Document tools (PDF processing, spreadsheet manipulation)</p></li></ul><p><strong>Server-to-Server Communication</strong></p><p>Enabling MCP servers to call each other:</p><ul><li><p>A calendar server queries a contacts server to enrich meeting attendee data</p></li><li><p>A research server calls a web search server to fetch sources</p></li><li><p>Composable server chains without client involvement</p></li></ul><h3>The Prediction</h3><p>MCP 2.0 (or equivalent major version) will land by mid-2027 with standardized auth, streaming, and multi-modal support. The protocol will feel complete rather than minimal.</p><h2>Trend 4: Enterprise Adoption</h2><h3>The Enterprise AI Problem</h3><p>Large organizations want AI automation but face:</p><ul><li><p><strong>Security concerns</strong>&#8212;Data can't leave the corporate network</p></li><li><p><strong>Compliance requirements</strong>&#8212;audit trails, access controls, data residency</p></li><li><p><strong>Integration complexity</strong>&#8212;Hundreds of internal tools, custom APIs, legacy systems</p></li><li><p><strong>Governance</strong>&#8212;Who approves which AI can access what?</p></li></ul><h3>How MCP Solves This</h3><p>MCP's local-first architecture is inherently enterprise-friendly:</p><ul><li><p><strong>Servers run inside the network</strong>&#8212;data stays on-premises</p></li><li><p><strong>Per-server permissions</strong>&#8212;Each server accesses only what's allowed</p></li><li><p><strong>Standard protocol</strong>&#8212;one integration pattern for all tools</p></li><li><p><strong>Audit capability</strong> &#8212; All tool calls are loggable</p></li></ul><h3>What's Coming</h3><ul><li><p><strong>Enterprise MCP platforms</strong>&#8212;Companies like Cloudflare, AWS, and Azure offering managed MCP infrastructure</p></li><li><p><strong>Internal MCP registries</strong>&#8212;Corporate app stores for approved MCP servers</p></li><li><p><strong>Policy engines</strong>&#8212;Centralized rules for which AI can use which tools, when, with what data</p></li><li><p><strong>SOC 2 / HIPAA compliant servers</strong>&#8212;Certified MCP servers for regulated industries</p></li></ul><h3>The Prediction</h3><p>By late 2027, enterprise MCP infrastructure will be a recognized market category, similar to how API gateways became standard enterprise infrastructure.</p><h2>Trend 5: The Knowledge Server Pattern</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WXi9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WXi9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WXi9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WXi9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WXi9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WXi9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WXi9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WXi9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WXi9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WXi9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bed9d7c-89f8-4346-a899-f26a1a7c5e32_552x1484.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Beyond Tool Servers</h3><p>Most MCP servers today are tool servers&#8212;they do things (send email, search web, manage files). A growing pattern is knowledge servers&#8212;they know things.</p><p>A knowledge server exposes structured information that AI can query:</p><ul><li><p>Company knowledge base</p></li><li><p>Product documentation</p></li><li><p>FAQ databases</p></li><li><p>Research libraries</p></li><li><p>Personal notes and archives</p></li></ul><h3>Why This Matters</h3><p>When AI can query your knowledge directly, it answers from your data&#8212;not its training data. This means:</p><ul><li><p>Answers grounded in your actual documentation</p></li><li><p>No hallucination about your specific products or processes</p></li><li><p>Always current (the knowledge server reads live data)</p></li><li><p>Personalized to your context</p></li></ul><h3>The Example: mcp-astgl-knowledge</h3><p>This is what I'm building&#8212;an MCP server that indexes all 20 articles in this series and makes them queryable by any AI client.</p><p><strong>Tools it will expose:</strong></p><ul><li><p>`search_answers` &#8212; Semantic search across all articles</p></li><li><p>`get_answer` &#8212; Retrieve a specific article by topic</p></li><li><p>`list_topics` &#8212; Browse all available topics</p></li><li><p>`get_faq` &#8212; Pull FAQ entries for specific questions</p></li></ul><p><strong>How it works:</strong></p><ul><li><p>Articles parsed from markdown (frontmatter + body)</p></li><li><p>Embeddings generated via local Ollama (nomic-embed-text)</p></li><li><p>Stored in SQLite with sqlite-vss for vector search</p></li><li><p>Served as an MCP server that any AI client can connect to</p></li></ul><p>When someone asks Claude "What's the best local LLM for coding?" and this server is connected, Claude queries the knowledge base and answers with information from article 12&#8212;not from its training data, which might be outdated.</p><h3>The Prediction</h3><p>Knowledge servers will be the fastest-growing MCP server category in 2027. Every company with documentation, every creator with a content library, every expert with a knowledge base will want one.</p><h2>What This Means for You</h2><h3>If You're a Developer</h3><p><strong>Now:</strong> Learn the MCP SDK, build a server, publish it. The ecosystem rewards early builders&#8212;servers published now accumulate installs and reputation.</p><p><strong>Next 12 months:</strong> Expect demand for custom MCP servers at companies integrating AI. MCP development will become a marketable skill alongside API development.</p><p><strong>Key skill:</strong> Understanding how AI agents use tools. The protocol is simple&#8212;the design of good tools is the hard part.</p><h3>If You're a Business Owner</h3><p><strong>Now:</strong> Connect MCP servers to Claude for immediate productivity gains. Start with email, calendar, and file access. Automate one workflow.</p><p><strong>Next 12 months:</strong> Evaluate local AI infrastructure for privacy and cost benefits. Budget for hardware that pays for itself in reduced API costs.</p><p><strong>Key opportunity:</strong> Businesses that adopt AI automation now will have 12-18 months of compounding efficiency gains over competitors who wait.</p><h3>If You're an Individual User</h3><p><strong>Now:</strong> Set up Claude Desktop with 2-3 MCP servers. Experience the difference between chatbot AI and tool-connected AI.</p><p><strong>Next 12 months:</strong> Consider a local AI setup (Ollama on existing hardware or a Mac Mini). Automate your most repetitive tasks.</p><p><strong>Key realization:</strong> AI connected to your tools is dramatically more useful than AI in a chat window. MCP servers are what make that connection possible.</p><h2>How I Actually Do This</h2><p>I've been building in this ecosystem for over a year. Here's what the trajectory looks like from the inside:</p><h3>What I'm Building Next</h3><p><strong>mcp-astgl-knowledge</strong>&#8212;The knowledge server I mentioned above. It's the capstone of this article series: 20 articles become a queryable knowledge base that any AI can access.</p><p><strong>The plan:</strong></p><p>1. Build with TypeScript + @modelcontextprotocol/sdk</p><p>2. Index all 20 articles with local embeddings</p><p>3. Publish to npm</p><p>4. Register on Smithery and mcpt</p><p>5. Share the build process as an ASTGL tutorial</p><p>This is the pattern I believe will explode: experts building knowledge servers that make their expertise available to AI systems.</p><h3>What I've Observed</h3><p>1. <strong>The tooling is getting better fast.</strong> Building an MCP server in early 2025 required reading the spec and figuring things out. In 2026, the SDK handles most of the boilerplate and registries provide distribution.</p><p>2. <strong>Local model quality jumped significantly.</strong> Gemma 4 was a step change. Tasks that needed cloud models a year ago now run locally at comparable quality. The gap keeps narrowing.</p><p>3. <strong>The automation compound effect is real.</strong> My first automation saved 30 minutes per day. Twenty-six automations save hours. Each new automation builds on the infrastructure of previous ones. The marginal cost of the 27th automation is near zero.</p><p>4. <strong>Community momentum is accelerating.</strong> The number of MCP servers, tools, and tutorials appearing weekly is orders of magnitude higher than a year ago. This is the network effect in action.</p><p>5. <strong>The biggest barrier is awareness, not technology.</strong> Most people who would benefit enormously from MCP servers don't know they exist. That's why I wrote this series&#8212;and why I'm building a knowledge server to make the information accessible.</p><h2>The 18-Month Outlook</h2><p>| Quarter | What to Expect |</p><p>|---------|---------------|</p><p>| <strong>Q2 2026</strong> (now) | Local models competitive for 85% of tasks. MCP registries at 5,000+ servers. |</p><p>| <strong>Q3 2026</strong> | Enterprise MCP pilots at major companies. Visual workflow builders support MCP natively. |</p><p>| <strong>Q4 2026</strong> | MCP auth standardization lands. Knowledge servers emerge as a category. |</p><p>| <strong>Q1 2027</strong> | Local models reach 90%+ parity for common tasks. MCP registries pass 15,000 servers. |</p><p>| <strong>Q2 2027</strong> | Enterprise MCP platforms from major cloud providers. One-click server installation standard. |</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/whats-the-future-of-mcp-servers-in?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/whats-the-future-of-mcp-servers-in?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/whats-the-future-of-mcp-servers-in?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>Will MCP be replaced by a competing standard?</h3><p>Unlikely in the near term. MCP has strong momentum, broad adoption, and backing from Anthropic with buy-in from Microsoft and Google. Standards wars are possible, but MCP's open protocol design and existing ecosystem make it the safe bet for building today.</p><h3>What if I invest in MCP and it becomes obsolete?</h3><p>The skills transfer. Understanding tool integration, agent patterns, and local AI architecture is valuable regardless of the specific protocol. If a successor to MCP emerges, it will solve the same problems in a similar way, and your experience will translate directly.</p><h3>Are local models improving fast enough to matter?</h3><p>Yes. The improvement curve for open-source models is steep. Every 6-12 months brings a significant quality jump. Hardware you buy today runs better models next year&#8212;your investment appreciates in capability over time.</p><h3>When should I start building with MCP?</h3><p>Now. The ecosystem is mature enough for production use but early enough that builders and early adopters have significant advantages. Every month you wait is a month of compounding automation benefits you don't capture.</p><h3>What's the single most important thing to do today?</h3><p>Connect one MCP server to Claude and use it for one real task. That first experience&#8212;seeing AI interact with your actual tools instead of just your text&#8212;changes how you think about what AI can do. Everything else follows from that realization.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/whats-the-future-of-mcp-servers-in/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/whats-the-future-of-mcp-servers-in/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item><item><title><![CDATA[How Do I Automate Workflows with AI Agents?]]></title><description><![CDATA[Agent workflows combine AI reasoning with tool access and scheduling to complete multi-step tasks autonomously. The architecture ranges from simple (one agent, one task) to complex (multiple agents coordinating on a pipeline).]]></description><link>https://astgl.com/p/how-do-i-automate-workflows-with</link><guid isPermaLink="false">https://astgl.com/p/how-do-i-automate-workflows-with</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 04:01:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!D4kg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Article 13 in this series introduced what AI agents are. This article goes deeper: how to design, build, and operate agent workflows that handle real work&#8212;from simple scheduled tasks to multi-agent orchestration.</p><p>If you've already automated a few tasks and want to level up, this is your guide.</p><h2>The Short Answer</h2><p>Agent workflows combine AI reasoning with tool access and scheduling to complete multi-step tasks autonomously. The architecture ranges from simple (one agent, one task) to complex (multiple agents coordinating on a pipeline).</p><p>| Complexity | Architecture | Example |</p><p>|-----------|-------------|---------|</p><p>| <strong>Simple</strong> | One agent, one task, scheduled | Morning briefing at 6:30 AM |</p><p>| <strong>Chained</strong> | Multiple steps, sequential | Research &#8594; Draft &#8594; Edit &#8594; Publish |</p><p>| <strong>Parallel</strong> | Multiple agents, simultaneous | 5 news sources searched concurrently |</p><p>| <strong>Orchestrated</strong> | Coordinator + specialist agents | Content council with 5 roles |</p><h2>Workflow Patterns</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D4kg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D4kg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!D4kg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!D4kg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!D4kg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D4kg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D4kg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!D4kg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!D4kg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!D4kg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83468ba3-619a-4f5f-8c99-ec207287d5ca_2368x472.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Pattern 1: Scheduled Single-Agent</h3><p>The simplest useful workflow. One agent runs one task on a schedule.</p><pre><code>[Schedule] &#8594; [Agent + Tools] &#8594; [Output + Delivery]</code></pre><p><strong>Example:</strong> Daily security audit</p><ul><li><p><strong>Schedule:</strong> Saturday 8:00 AM</p></li><li><p><strong>Agent:</strong> Gemma 4 31B with filesystem MCP</p></li><li><p><strong>Task:</strong> Read all config files, check for common misconfigurations, compare against best practices</p></li><li><p><strong>Output:</strong> Audit report delivered to Discord</p></li></ul><p><strong>When to use:</strong> Any standalone task that repeats on a schedule and benefits from AI reasoning.</p><h3>Pattern 2: Sequential Chain</h3><p>Multiple steps execute in order, each feeding into the next.</p><pre><code>[Step 1: Research] &#8594; [Step 2: Draft] &#8594; [Step 3: Edit] &#8594; [Step 4: Publish]</code></pre><p><strong>Example:</strong> Content creation pipeline</p><ul><li><p>Step 1: SCOUT agent searches for trending topics, produces research brief</p></li><li><p>Step 2: QUILL agent writes article from research brief</p></li><li><p>Step 3: LEDGER agent fact-checks article against sources</p></li><li><p>Step 4: MAVEN agent generates distribution pieces</p></li></ul><p><strong>When to use:</strong> Tasks with natural stages where each stage's output becomes the next stage's input.</p><h3>Pattern 3: Fan-Out / Fan-In</h3><p>One task spawns multiple parallel tasks, results are collected and synthesized.</p><pre><code>&#8594; [Agent A: Source 1] &#8594;
[Dispatch] &#8594; [Agent B: Source 2] &#8594; [Collect + Synthesize]
            &#8594; [Agent C: Source 3] &#8594;</code></pre><p><strong>Example:</strong> Competitive research</p><ul><li><p>Dispatch: "Research these 5 competitors"</p></li><li><p>5 parallel agents each research one competitor</p></li><li><p>Collector synthesizes all 5 reports into a single competitive brief</p></li></ul><p><strong>When to use:</strong> Tasks that can be decomposed into independent subtasks, where parallelism saves time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6xzE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6xzE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6xzE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6xzE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6xzE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6xzE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6xzE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6xzE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6xzE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6xzE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec8a0107-ee91-4aee-b8d4-ab86d8df9f32_1354x764.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Pattern 4: Router + Specialists</h3><p>A lightweight router examines each incoming task and dispatches it to the best specialist.</p><pre><code>[Input] &#8594; [Router] &#8594; [Specialist A (code)]
                   &#8594; [Specialist B (writing)]
                   &#8594; [Specialist C (research)]
                   &#8594; [Specialist D (triage)]</code></pre><p><strong>Example:</strong> Notification processing</p><ul><li><p>Router: Gemma 4 e4B classifies incoming notifications (fast, cheap)</p></li><li><p>Critical &#8594; Immediate Discord alert</p></li><li><p>Important &#8594; Queue for hourly batch</p></li><li><p>Routine &#8594; Queue for 3-hour digest</p></li><li><p>Spam &#8594; Discard and log</p></li></ul><p><strong>When to use:</strong> High-volume inputs that need different handling based on content or urgency.</p><h3>Pattern 5: Multi-Agent Council</h3><p>Multiple specialized agents collaborate on a complex task, each contributing their expertise.</p><pre><code>[SCOUT] &#8594; findings &#8594; [FORGE] &#8594; outline &#8594; [QUILL] &#8594; draft &#8594; [LEDGER] &#8594; verified &#8594; [MAVEN] &#8594; published
                                                         &#8593;                              |
                                                         &#9492;&#9472;&#9472;&#9472;&#9472; revision request &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></pre><p><strong>Example:</strong> Content production council (my actual setup)</p><ul><li><p>SCOUT: Research and topic discovery</p></li><li><p>FORGE: Structure and outlining</p></li><li><p>QUILL: Drafting with voice profile</p></li><li><p>LEDGER: Fact-checking and validation</p></li><li><p>MAVEN: SEO, distribution, and publishing</p></li></ul><p>Agents can request revisions from earlier agents&#8212;LEDGER can send a draft back to QUILL if facts don't check out.</p><p><strong>When to use:</strong> Complex, multi-faceted work where specialized expertise improves quality.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Building a Workflow: Step by Step</h2><p>Let's build a real workflow from scratch&#8212;a weekly competitive intelligence report.</p><h3>Step 1: Define the Goal</h3><blockquote><p>"Every Monday at 7 AM, research the top 5 competitors in my space, summarize their recent activity, identify notable changes, and deliver a structured report to Discord."</p></blockquote><h3>Step 2: Choose the Architecture</h3><p>This is a fan-out/fan-in pattern:</p><ul><li><p>Fan-out: Research 5 competitors in parallel</p></li><li><p>Fan-in: Synthesize into one report</p></li></ul><h3>Step 3: Design the Agents</h3><p><strong>Research Agent (runs 5 times, once per competitor):</strong></p><ul><li><p>Model: Gemma 4 26B</p></li><li><p>Tools: Web search MCP server</p></li><li><p>Input: Competitor name + website</p></li><li><p>Output: Structured findings (recent blog posts, product changes, social mentions, job postings)</p></li></ul><p><strong>Synthesis Agent (runs once):</strong></p><ul><li><p>Model: Gemma 4 26B</p></li><li><p>Tools: Filesystem MCP (to save the report)</p></li><li><p>Input: All 5 research outputs</p></li><li><p>Output: Formatted competitive brief with highlights, threats, and opportunities</p></li></ul><p><strong>Delivery Agent:</strong></p><ul><li><p>Input: Final report</p></li><li><p>Output: Discord message with report content</p></li></ul><h3>Step 4: Define the Schedule</h3><pre><code># Cron expression: Every Monday at 7:00 AM
0 7 * * 1</code></pre><h3>Step 5: Build Error Handling</h3><p>| Failure | Handling |</p><p>|---------|---------|</p><p>| Web search fails for one competitor | Skip that competitor, note in report |</p><p>| Model times out | Retry once, then use smaller model as fallback |</p><p>| All searches fail | Alert human, skip this week's report |</p><p>| Discord delivery fails | Save report to file, alert via email |</p><h3>Step 6: Add Logging</h3><p>Every agent execution logs:</p><ul><li><p>Timestamp</p></li><li><p>Input received</p></li><li><p>Tools called and their responses</p></li><li><p>Output generated</p></li><li><p>Execution time</p></li><li><p>Any errors or retries</p></li></ul><h3>Step 7: Test and Iterate</h3><p>Run the workflow manually first. Review the output. Adjust prompts, model choice, and error handling based on real results. Only schedule it after 3 successful manual runs.</p><h2>Orchestration Tools</h2><h3>OpenClaw (Local Gateway)</h3><p>OpenClaw is a local AI gateway that manages model routing, task scheduling, and tool execution.</p><p><strong>Strengths:</strong></p><ul><li><p>Runs entirely locally&#8212;no cloud dependency</p></li><li><p>Routes tasks to appropriate models based on complexity</p></li><li><p>Manages MCP server connections</p></li><li><p>Built-in scheduling and delivery (Discord, Slack, email)</p></li><li><p>Logging and monitoring</p></li></ul><p><strong>Best for:</strong> Users who want full local control over their agent workflows.</p><h3>n8n (Visual Workflow Builder)</h3><p>n8n provides a visual drag-and-drop interface for building workflows.</p><p><strong>Strengths:</strong></p><ul><li><p>No-code visual builder</p></li><li><p>Hundreds of pre-built integrations</p></li><li><p>Self-hostable (runs on your machine)</p></li><li><p>Supports webhooks, schedules, and event triggers</p></li></ul><p><strong>Best for:</strong> Non-developers who want automation without writing code.</p><h3>Cron + Scripts (DIY)</h3><p>The simplest orchestration: cron jobs that run scripts calling the Ollama API.</p><p><strong>Strengths:</strong></p><ul><li><p>Zero additional software</p></li><li><p>Works on any Unix system</p></li><li><p>Complete control</p></li><li><p>No abstraction overhead</p></li></ul><p><strong>Best for:</strong> Developers comfortable with bash scripting who want minimal dependencies.</p><h3>Claude Agent SDK (Custom Code)</h3><p>Anthropic's SDK for building custom agent logic in Python or TypeScript.</p><p><strong>Strengths:</strong></p><ul><li><p>Full programmatic control</p></li><li><p>Access to Claude's tool-use capabilities</p></li><li><p>Complex agent logic (loops, conditionals, multi-turn)</p></li><li><p>Production-grade error handling</p></li></ul><p><strong>Best for:</strong> Developers building sophisticated custom agents.</p><h2>How I Actually Do This</h2><p>My workflow orchestration runs through OpenClaw on a Mac Studio. Here's the production architecture:</p><h3>The Orchestration Layer</h3><pre><code>OpenClaw Gateway
&#9500;&#9472;&#9472; Schedule Manager (cron-like)
&#9500;&#9472;&#9472; Model Router (triage &#8594; specialist)
&#9500;&#9472;&#9472; MCP Connector (15+ servers)
&#9500;&#9472;&#9472; Delivery Manager (Discord, file system)
&#9492;&#9472;&#9472; Log Aggregator</code></pre><h3>Daily Workflow Map</h3><p>| Time | Workflow | Pattern | Agents |</p><p>|------|---------|---------|--------|</p><p>| 6:00 AM | Research pipeline | Fan-out/fan-in | 5 source agents + 1 synthesizer |</p><p>| 6:15 AM | Log review | Single agent | 1 analyst agent |</p><p>| 6:30 AM | Morning briefing | Sequential chain | Calendar &#8594; Email &#8594; Tasks &#8594; News &#8594; Synthesizer |</p><p>| 7:00 AM | Content research | Fan-out/fan-in | 3 niche agents + 1 trend analyzer |</p><p>| Every 5 min | Critical alerts | Router + specialist | Router &#8594; Discord delivery |</p><p>| Every hour | Notification batch | Router + collector | Router &#8594; Batch &#8594; Discord |</p><p>| 8:00 PM | Evening summary | Sequential chain | Activity log &#8594; Synthesizer &#8594; Discord |</p><p>| 8:30 PM | KB builder | Single agent | Knowledge base agent |</p><h3>Multi-Agent Council Integration</h3><p>The ACA Council (SCOUT/FORGE/QUILL/LEDGER/MAVEN) runs as an orchestrated multi-agent workflow:</p><p>1. <strong>Morning meeting (7 AM):</strong> SCOUT presents topic research, council prioritizes</p><p>2. <strong>Production cycle:</strong> Sequential chain through all 5 agents</p><p>3. <strong>Evening meeting (8 PM):</strong> Review completed articles, queue for publishing</p><p>4. <strong>Publishing:</strong> Automated sync to site, Substack, and social channels</p><h3>Paperclip Integration</h3><p>Paperclip (a separate agent management platform) provides additional orchestration for agents that need web-based interfaces and team collaboration. It runs alongside OpenClaw&#8212;some workflows use OpenClaw's local scheduling, others use Paperclip's cloud features.</p><p>The key insight: <strong>you don't need one orchestration tool.</strong> Different workflows have different needs. Simple schedules use cron. Complex pipelines use OpenClaw. Team-visible workflows use Paperclip.</p><h3>Lessons from Production</h3><p>1. <strong>Start with single-agent workflows.</strong> Get one agent reliable before adding coordination complexity. My first 10 workflows were all single-agent scheduled tasks.</p><p>2. <strong>The router pattern is the highest-leverage addition.</strong> Adding a triage router that classifies incoming work and dispatches to the right model immediately improved quality and speed across all workflows.</p><p>3. <strong>Logging saved me dozens of hours.</strong> When an agent produces bad output, logs show exactly what happened. Without logs, you're guessing. I log every tool call, every model response, every delivery.</p><p>4. <strong>Agents need guardrails, not just goals.</strong> "Research competitors" is too vague. "Search for blog posts published in the last 7 days from these 5 domains, extract titles and summaries, skip anything older than 7 days" &#8212; that produces reliable results.</p><p>5. <strong>Schedule slack prevents cascading failures.</strong> My 6:00 AM research pipeline sometimes takes 25 minutes. The 6:30 AM briefing doesn't depend on it&#8212;theyf run independently. Dependent workflows have explicit wait conditions, not just time offsets.</p><h2>Monitoring and Maintenance</h2><h3>What to Monitor</h3><p>| Metric | Why It Matters | Alert Threshold |</p><p>|--------|---------------|----------------|</p><p>| <strong>Execution time</strong> | Detect slowdowns before they cascade | &gt;2x normal duration |</p><p>| <strong>Error rate</strong> | Catch model or tool failures | &gt;10% of executions |</p><p>| <strong>Output quality</strong> | Detect model drift or prompt degradation | Spot-check weekly |</p><p>| <strong>Token usage</strong> | Track resource consumption | Unexpected spikes |</p><p>| <strong>Tool call failures</strong> | MCP server or API issues | Any persistent failure |</p><h3>Weekly Maintenance</h3><ul><li><p>Review error logs&#8212;fix recurring issues</p></li><li><p>Spot-check 2-3 outputs per workflow for quality</p></li><li><p>Update models if new versions improve quality</p></li><li><p>Review and trim logs (they grow fast)</p></li><li><p>Check MCP server updates</p></li></ul><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-i-automate-workflows-with?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-i-automate-workflows-with?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/how-do-i-automate-workflows-with?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>How many agent workflows can I run simultaneously?</h3><p>Depends on your hardware and model sizes. A Mac Mini with 32 GB comfortably runs 3-5 concurrent lightweight workflows. A Mac Studio with 192+ GB runs 20+ concurrent workflows across multiple models. The bottleneck is usually model memory, not CPU.</p><h3>Can agent workflows interact with each other?</h3><p>Yes&#8212;through shared data. One workflow writes results to a file or database; another reads them. For direct coordination, use a message queue or orchestration layer. Keep interactions simple to maintain debuggability.</p><h3>What's the failure rate for agent workflows?</h3><p>Well-designed workflows with proper error handling run at 95%+ success rates. The remaining failures are usually transient (API timeouts, network issues) that resolve on retry. Poorly designed workflows (vague goals, no error handling) fail 20-40% of the time.</p><h3>Should I use local or cloud models for agent workflows?</h3><p>Local for volume, cloud for quality. If a workflow runs 50+ times per day, local models save significant money. If a workflow runs once per week and quality is critical, cloud models may be worth the cost. Most production setups use both.</p><h3>How do I debug a failing agent workflow?</h3><p>Logs are everything. Check: (1) What input did the agent receive? (2) What tools did it call? (3) What did the tools return? (4) What output did the agent produce? The failure is usually in step 2 or 3&#8212;a tool returned unexpected data, or the model misinterpreted the tool response.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-i-automate-workflows-with/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/how-do-i-automate-workflows-with/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item><item><title><![CDATA[How Do MCP Registries Work (Smithery, mcpt)?]]></title><description><![CDATA[MCP registries are directories of MCP servers&#8212;searchable, categorized, and installable. They solve the discovery problem: "Which MCP server does X, and how do I install it?"]]></description><link>https://astgl.com/p/how-do-mcp-registries-work-smithery</link><guid isPermaLink="false">https://astgl.com/p/how-do-mcp-registries-work-smithery</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 03:43:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mlU0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are thousands of MCP servers available. Finding the right one, evaluating whether it's trustworthy, and installing it correctly&#8212;that's where registries come in.</p><p>Here's how MCP registries work, which ones matter, and how to use them effectively.</p><h2>The Short Answer</h2><p>MCP registries are directories of MCP servers&#8212;searchable, categorized, and installable. They solve the discovery problem: "Which MCP server does X, and how do I install it?"</p><p>| Registry | Size | Strength | Best For |</p><p>|----------|------|----------|----------|</p><p>| <strong>Smithery</strong> | Largest (5,000+) | Breadth, install commands, reviews | Finding any MCP server |</p><p>| <strong>mcpt</strong> | Curated (500+) | Quality focus, CLI tool, auto-updates | Reliable production servers |</p><p>| <strong>OpenTools</strong> | Growing (1,000+) | Search, categories | Alternative discovery |</p><p>| <strong>npm</strong> | Everything | Raw package access | Developers building custom setups |</p><h2>How Discovery Works</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mlU0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mlU0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mlU0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mlU0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mlU0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mlU0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mlU0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mlU0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mlU0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mlU0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d485df-09f5-4cc8-a56f-cc7c7cf71115_798x2688.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Problem Registries Solve</h3><p>Without registries, finding an MCP server means:</p><p>1. Searching GitHub for "mcp-server-[thing you want]"</p><p>2. Hoping the README has install instructions</p><p>3. Guessing if it's maintained, secure, and compatible</p><p>4. Manually configuring everything</p><p>With registries:</p><p>1. Search or browse by category</p><p>2. Read description, reviews, and install command</p><p>3. Copy-paste the command</p><p>4. Done</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Search Patterns</h3><p>Most registries support these discovery methods:</p><p><strong>Category browsing:</strong> Productivity, Development, Data, Communication, Creative, Business</p><p><strong>Keyword search:</strong> "gmail", "database", "web scraping", "calendar"</p><p><strong>Tag filtering:</strong> "official", "verified", "popular", "new"</p><p><strong>Sort options:</strong> Most installed, highest rated, recently updated</p><h2>Smithery: The Largest Registry</h2><p>Smithery is the de facto standard for MCP server discovery. Here's how to use it effectively.</p><h3>Browsing Smithery</h3><p>Visit <a href="https://smithery.ai">smithery.ai</a> and you'll see:</p><ul><li><p><strong>Featured servers</strong>&#8212;editorially curated highlights</p></li><li><p><strong>Categories</strong>&#8212;organized by use case</p></li><li><p><strong>Search</strong>&#8212;keyword search across all servers</p></li><li><p><strong>Trending</strong>&#8212;most popular servers this week</p></li></ul><h3>Reading a Server Listing</h3><p>Each server page shows:</p><p>| Section | What It Tells You |</p><p>|---------|------------------|</p><p>| <strong>Description</strong> | What the server does and its capabilities |</p><p>| <strong>Tools</strong> | Specific tools the server exposes (e.g., `list_events`, `create_event`) |</p><p>| <strong>Install command</strong> | Copy-paste for Claude Desktop or Claude Code |</p><p>| <strong>Configuration</strong> | Required API keys or settings |</p><p>| <strong>Author</strong> | Who built it&#8212;official orgs vs. community |</p><p>| <strong>Stats</strong> | Install count, last updated, GitHub stars |</p><p>| <strong>Reviews</strong> | User feedback on reliability and quality |</p><h3>Installing from Smithery</h3><p><strong>For Claude Desktop:</strong></p><p>Smithery provides the exact JSON block to add to your config file:</p><pre><code>{
  "mcpServers": {
    "server-name": {
      "command": "npx",
      "args": ["-y", "@scope/mcp-server-name"],
      "env": {
        "API_KEY": "your-key"
      }
    }
  }
}</code></pre><p>Copy it, paste it into `claude_desktop_config.json`, add your API key, restart Claude.</p><p><strong>For Claude Code:</strong></p><p>Smithery often shows the CLI command:</p><pre><code>claude mcp add server-name -- npx -y @scope/mcp-server-name</code></pre><h3>Evaluating Quality on Smithery</h3><p>Not all servers are equal. Here's how to assess quality:</p><p>| Signal | Good Sign | Warning Sign |</p><p>|--------|-----------|-------------|</p><p>| <strong>Author</strong> | Official org or verified developer | Anonymous, no GitHub link |</p><p>| <strong>Last updated</strong> | Within the past 3 months | Over 6 months ago |</p><p>| <strong>Install count</strong> | Hundreds or thousands | Single digits |</p><p>| <strong>GitHub stars</strong> | Active community | No repository linked |</p><p>| <strong>Reviews</strong> | Specific positive feedback | No reviews or vague complaints |</p><p>| <strong>Tools listed</strong> | Clear, well-documented tools | Vague or missing tool descriptions |</p><h2>mcpt: The Curated Alternative</h2><p>mcpt takes a quality-first approach. Fewer servers, but higher average reliability.</p><h3>The mcpt CLI</h3><p>mcpt provides a command-line tool for managing MCP servers:</p><pre><code># Install the CLI
npm install -g mcpt

# Search for servers
mcpt search calendar

# Install a server
mcpt install google-calendar

# List installed servers
mcpt list

# Update all servers
mcpt update</code></pre><h3>Advantages of mcpt</h3><p>| Feature | Smithery | mcpt |</p><p>|---------|----------|------|</p><p>| <strong>Curation</strong> | Community-driven | Editorially reviewed |</p><p>| <strong>Install method</strong> | Manual config editing | CLI tool handles everything |</p><p>| <strong>Updates</strong> | Manual | `mcpt update` handles all |</p><p>| <strong>Quality bar</strong> | Low (anyone can list) | Higher (review process) |</p><p>| <strong>Size</strong> | Largest selection | Smaller, more reliable |</p><h3>When to Use mcpt vs. Smithery</h3><ul><li><p><strong>Use Smithery</strong> when you need a server for an obscure tool or want maximum choice</p></li><li><p><strong>Use mcpt</strong> when you want reliable, well-maintained servers with easy management</p></li></ul><h2>OpenTools and Other Registries</h2><h3>OpenTools</h3><p>OpenTools is a growing registry with a clean search interface. It focuses on categorization and discoverability. Worth checking if you don't find what you need on Smithery.</p><h3>npm Direct</h3><p>Every Node.js-based MCP server is published to npm. You can search npm directly:</p><pre><code>npm search mcp-server</code></pre><p>This gives you access to everything, including servers not yet listed on any registry. But there's no curation, reviews, or quality signals&#8212;you're on your own to evaluate.</p><h3>GitHub</h3><p>Many MCP servers live on GitHub before they're registered anywhere. Search GitHub for:</p><ul><li><p>`mcp-server` (general)</p></li><li><p>`modelcontextprotocol` (official repos)</p></li><li><p>`mcp-server-[tool-name]` (specific tools)</p></li></ul><p>GitHub gives you access to source code, issues, commit history, and contributor activity&#8212;the deepest quality signals available.</p><h2>Security Considerations</h2><p>MCP servers run code on your machine. Take security seriously.</p><h3>Before Installing Any Server</h3><p>1. <strong>Check the source.</strong> Is the GitHub repo linked? Can you see the code?</p><p>2. <strong>Check the author.</strong> Is it an organization you recognize? A developer with a history?</p><p>3. <strong>Read the permissions.</strong> What tools does it expose? What data can it access?</p><p>4. <strong>Check for credentials.</strong> Does it need API keys? Where does it send data?</p><p>5. <strong>Check freshness.</strong> When was it last updated? Are dependencies current?</p><h3>Red Flags</h3><p>| Red Flag | Risk |</p><p>|----------|------|</p><p>| No source code available | Can't verify what the code does |</p><p>| Requests unusual permissions | May access more than needed |</p><p>| No clear author or organization | Harder to trust, no accountability |</p><p>| Hasn't been updated in 12+ months | May have unpatched vulnerabilities |</p><p>| Very few installs, no reviews | Unvalidated by the community |</p><h3>Best Practices</h3><ul><li><p><strong>Start with official servers</strong> from Anthropic, Google, Microsoft, and established organizations</p></li><li><p><strong>Read the code</strong> if you're installing a community server that handles sensitive data</p></li><li><p><strong>Scope permissions</strong>&#8212;only give file system access to directories you actually need</p></li><li><p><strong>Monitor behavior</strong>&#8212;check logs if a server seems to be making unexpected network calls</p></li></ul><h2>How I Actually Do This</h2><p>I use a mix of Smithery, npm, and direct GitHub sources for my MCP server setup.</p><h3>My Discovery Workflow</h3><p>1. <strong>Need a server</strong> &#8594; Search Smithery first (broadest selection)</p><p>2. <strong>Found candidates</strong> &#8594; Check GitHub repo for each (code quality, maintenance)</p><p>3. <strong>Evaluate</strong> &#8594; Look at install count, last update, and whether tools match my needs</p><p>4. <strong>Test</strong> &#8594; Install and test with a simple prompt before adding to automation</p><p>5. <strong>Production</strong> &#8594; Only servers that pass testing go into my daily workflow</p><h3>Building for Registries</h3><p>I'm building `mcp-astgl-knowledge` &#8212; an MCP server that exposes all 20 articles in this series as searchable knowledge. The plan:</p><p>1. <strong>Build</strong> with TypeScript and the `@modelcontextprotocol/sdk`</p><p>2. <strong>Publish</strong> to npm (`npm publish`)</p><p>3. <strong>Register</strong> on Smithery (submit listing with description, tools, install command)</p><p>4. <strong>Register</strong> on mcpt (submit for review)</p><p>5. <strong>Maintain</strong>&#8212;update when new articles are added, respond to issues</p><p>This will be a real example of the full lifecycle: build &#8594; publish &#8594; register &#8594; maintain. I'll update this article with the actual experience once it's done.</p><h3>What I've Learned</h3><p>1. <strong>Smithery is the starting point for everyone.</strong> It has the most servers and the most familiar interface. Start there.</p><p>2. <strong>mcpt's CLI is underrated.</strong> Managing updates across 15 servers manually is tedious. `mcpt update` handles it.</p><p>3. <strong>Official servers are worth the premium.</strong> Anthropic's official MCP servers for filesystem, web search, and databases are rock-solid. Community servers vary widely in quality.</p><p>4. <strong>Read the tools list carefully.</strong> A "Gmail MCP server" might only support reading emails, not sending them. The tools list tells you exactly what's possible.</p><p>5. <strong>The ecosystem is young and growing fast.</strong> New servers appear daily. Check registries monthly&#8212;the server you wished existed last month might exist now.</p><h2>Publishing Your Own MCP Server</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NIVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NIVN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NIVN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NIVN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NIVN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NIVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NIVN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NIVN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NIVN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NIVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0127aff5-ca9e-44f5-a7d8-6f8153e94133_1014x1388.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you've built something useful, publishing it helps the community and builds your reputation.</p><h3>The Publishing Process</h3><p>1. <strong>Build</strong> your server using the MCP SDK</p><p>2. <strong>Test</strong> it locally with Claude Desktop or Claude Code</p><p>3. <strong>Publish</strong> the npm package: `npm publish`</p><p>4. <strong>Register</strong> on Smithery: Submit your listing with description, install command, and documentation</p><p>5. <strong>Register</strong> on mcpt: Submit for editorial review</p><p>6. <strong>Maintain</strong>: Respond to issues, update dependencies, improve based on feedback</p><h3>What Makes a Good MCP Server Listing</h3><p>| Element | Why It Matters |</p><p>|---------|---------------|</p><p>| <strong>Clear description</strong> | Users need to know what it does in 2 sentences |</p><p>| <strong>Tool documentation</strong> | Every tool should have a name, description, and example |</p><p>| <strong>Install command</strong> | Copy-paste ready for Claude Desktop and Claude Code |</p><p>| <strong>Configuration guide</strong> | What API keys or settings are needed |</p><p>| <strong>Source code link</strong> | Transparency builds trust |</p><p>| <strong>Changelog</strong> | Shows the server is actively maintained |</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-mcp-registries-work-smithery?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-mcp-registries-work-smithery?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/how-do-mcp-registries-work-smithery?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>Are MCP registries safe?</h3><p>Registries are directories, not security guarantees. They make discovery easier but don't audit every server's code. Treat registry listings like npm packages&#8212;check the source, author, and community signals before installing. Official and popular servers are generally safe. Obscure, unreviewed servers deserve more scrutiny.</p><h3>Can I use MCP servers not listed on any registry?</h3><p>Yes. Any MCP server can be installed manually by pointing your config to the npm package or local path. Registries are for discovery&#8212;they're not gatekeepers. You can even build private MCP servers that never appear on any registry.</p><h3>How often should I update my MCP servers?</h3><p>Monthly is a good cadence. Security patches, bug fixes, and new features arrive regularly. If you use mcpt, `mcpt update` handles everything. For manual installs via `npx`, you always get the latest version automatically.</p><h3>Will there be one dominant MCP registry?</h3><p>Probably not&#8212;the ecosystem benefits from multiple registries with different strengths. Smithery for breadth, mcpt for quality, npm for raw access. This mirrors how package managers work: npm, GitHub, and specialized registries coexist.</p><h3>Can I run my own private MCP registry?</h3><p>Yes, for organizations that want to share internal MCP servers. The MCP protocol doesn't require public registries&#8212;any discoverable endpoint works. Some companies run internal registries for proprietary servers that access internal systems.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-mcp-registries-work-smithery/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/how-do-mcp-registries-work-smithery/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item><item><title><![CDATA[What's the ROI of Local AI Infrastructure?]]></title><description><![CDATA[The question isn't whether local AI saves money&#8212;it does. Local AI has high upfront cost and near-zero ongoing cost. Cloud AI has zero upfront cost and scales linearly forever.]]></description><link>https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure</link><guid isPermaLink="false">https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 03:29:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZKWP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The question isn't whether local AI saves money&#8212;it does. The question is how fast and how much, based on your specific usage pattern.</p><p>Here's the real math, with actual hardware costs, cloud API pricing, and the breakeven points where local infrastructure pays for itself.</p><h2>The Short Answer</h2><p>Local AI has high upfront cost and near-zero ongoing cost. Cloud AI has zero upfront cost and scales linearly forever. The crossover point depends on your usage volume.</p><p>| | Local AI | Cloud AI |</p><p>|--|----------|----------|</p><p>| <strong>Upfront cost</strong> | $600-8,000 (hardware) | $0 |</p><p>| <strong>Monthly cost</strong> | $5-15 (electricity) | $50-5,000+ (API fees) |</p><p>| <strong>Per-call cost</strong> | $0 | $0.001-0.10 per call |</p><p>| <strong>Scales with usage</strong> | No&#8212;flat cost | Yes&#8212;more usage = more cost |</p><p>| <strong>Quality ceiling</strong> | Very good (not frontier) | Frontier models available |</p><p>| <strong>Privacy</strong> | Complete&#8212;data stays local | Data sent to provider |</p><p><strong>Rule of thumb:</strong> If you'd spend more than $100/month on API calls, local AI probably pays for itself within a year.</p><h2>The Cloud Cost Reality</h2><p>Cloud AI pricing is per-token. Here's what real usage patterns cost:</p><h3>Typical Monthly Cloud Costs</h3><p>| Usage Pattern | Calls/Day | Model | Monthly Cost |</p><p>|--------------|-----------|-------|-------------|</p><p>| <strong>Casual user</strong> | 10-20 | Claude Sonnet | $10-30 |</p><p>| <strong>Power user</strong> | 50-100 | Claude Sonnet | $50-200 |</p><p>| <strong>Developer with AI tools</strong> | 200-500 | Mixed models | $200-800 |</p><p>| <strong>Automated workflows</strong> | 500-1,000 | Claude Haiku + Sonnet | $500-2,000 |</p><p>| <strong>Full automation pipeline</strong> | 2,000-5,000 | Mixed models | $2,000-8,000 |</p><p>| <strong>Enterprise scale</strong> | 10,000+ | Mixed models | $10,000+ |</p><p>The jump from casual to automated is where costs explode. A morning briefing that runs daily, a content pipeline that generates articles, and notification routing that processes hundreds of messages&#8212;these add up fast.</p><h3>The Subscription Alternative</h3><p>Claude Pro ($20/month) and Claude Max ($100-200/month) offer high-volume access at flat rates. These are excellent value for interactive use. But they have rate limits that don't work well for automated pipelines running 24/7.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Local Cost Reality</h2><h3>Hardware Options</h3><p>| Device | RAM | Price | Best For |</p><p>|--------|-----|-------|----------|</p><p>| <strong>Mac Mini M4</strong> | 32 GB | ~$800 | Entry-level: runs 7-12B models comfortably |</p><p>| <strong>Mac Mini M4 Pro</strong> | 48 GB | ~$1,400 | Mid-range: runs 26B models, 2-3 concurrent |</p><p>| <strong>Mac Studio M3 Max</strong> | 96 GB | ~$3,000 | Serious: runs 70B models, full automation |</p><p>| <strong>Mac Studio M3 Ultra</strong> | 192 GB | ~$5,000 | Professional: multiple large models simultaneously |</p><p>| <strong>Mac Studio M3 Ultra</strong> | 512 GB | ~$8,000 | Maximum: every model, every size, all at once |</p><p>| <strong>Linux + RTX 4090</strong> | 24 GB VRAM | ~$2,500 | Fast inference, limited to one large model |</p><p>| <strong>Linux + 2x RTX 4090</strong> | 48 GB VRAM | ~$4,500 | High throughput, parallel inference |</p><p><strong>Apple Silicon advantage:</strong> Unified memory means the GPU can access all system RAM. A 192 GB Mac Studio can run models that would require multiple $2,000 GPUs on Linux.</p><h3>Ongoing Costs</h3><p>| Cost | Monthly | Annual |</p><p>|------|---------|--------|</p><p>| <strong>Electricity</strong> (always-on Mac Studio) | $10-15 | $120-180 |</p><p>| <strong>Internet</strong> (already have it) | $0 incremental | $0 |</p><p>| <strong>Software</strong> (Ollama, open-source models) | $0 | $0 |</p><p>| <strong>Maintenance time</strong> (~2 hours/month) | Time cost | Time cost |</p><p>| <strong>Total cash cost</strong> | <strong>$10-15</strong> | <strong>$120-180</strong> |</p><h2>Breakeven Analysis</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZKWP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZKWP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZKWP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZKWP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZKWP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZKWP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZKWP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZKWP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZKWP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZKWP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb842fc-cddf-4f54-888d-f94ff055070b_1172x1548.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Scenario 1: Light Automation</h3><p><strong>Setup:</strong> Mac Mini 32 GB ($800) running morning briefings and email triage.</p><p><strong>Cloud alternative:</strong> ~$150/month in API calls (500 calls/day, mixed models).</p><p><strong>Breakeven:</strong> $800 &#247; $150/month = <strong>5.3 months</strong></p><p><strong>Year 1 savings:</strong> ($150 &#215; 12) - $800 - $150 electricity = <strong>$850</strong></p><h3>Scenario 2: Content Creator</h3><p><strong>Setup:</strong> Mac Mini 48 GB ($1,400) running content pipeline, research, and repurposing.</p><p><strong>Cloud alternative:</strong> ~$400/month in API calls (content generation is token-heavy).</p><p><strong>Breakeven:</strong> $1,400 &#247; $400/month = <strong>3.5 months</strong></p><p><strong>Year 1 savings:</strong> ($400 &#215; 12) - $1,400 - $150 = <strong>$3,250</strong></p><h3>Scenario 3: Full Automation</h3><p><strong>Setup:</strong> Mac Studio 192 GB ($5,000) running 26 daily tasks, content pipeline, multi-agent council.</p><p><strong>Cloud alternative:</strong> ~$2,000/month in API calls (thousands of daily calls across multiple agents).</p><p><strong>Breakeven:</strong> $5,000 &#247; $2,000/month = <strong>2.5 months</strong></p><p><strong>Year 1 savings:</strong> ($2,000 &#215; 12) - $5,000 - $180 = <strong>$18,820</strong></p><h3>The Pattern</h3><p>The more you automate, the faster local infrastructure pays for itself. Light users might take a year to break even. Heavy automation users break even in months.</p><h2>Beyond Dollar Savings: Hidden ROI</h2><p>The financial math is compelling, but the less obvious benefits matter too.</p><h3>Privacy ROI</h3><p>With local AI, sensitive business data never leaves your machine. No data processing agreements. No compliance concerns about which country your data is processed in. No risk of training data leakage.</p><p><strong>For regulated industries</strong> (healthcare, legal, finance), this alone can justify the hardware cost&#8212;the alternative is expensive enterprise AI contracts with compliance guarantees.</p><h3>Availability ROI</h3><p>Cloud APIs have outages. Rate limits. Capacity constraints during peak hours. Your automated pipeline at 6 AM shouldn't depend on whether a cloud provider's servers are congested.</p><p>Local AI is available whenever your computer is on. No rate limits. No outages (except your own hardware). No "please try again later."</p><h3>Latency ROI</h3><p>Local inference is fast&#8212;especially on Apple Silicon. A Gemma 4 26B running locally generates tokens faster than most cloud APIs deliver them, because there's no network round trip.</p><p>For interactive use, this means snappier responses. For automation, this means faster pipeline throughput.</p><h3>Experimentation ROI</h3><p>When every API call costs money, you hesitate to experiment. With local models, experimentation is free. Try 50 different prompt variations. Run A/B tests on voice profiles. Process your entire email archive to build training data. The marginal cost is zero.</p><p>This freedom to experiment accelerates learning and leads to better automation designs.</p><h2>How I Actually Do This</h2><p>I run a Mac Studio M3 Ultra with 256 GB unified memory. Here's the real financial picture:</p><h3>My Costs</h3><p>| Item | Cost |</p><p>|------|------|</p><p>| Mac Studio M3 Ultra 256 GB | $7,000 (one-time) |</p><p>| Electricity (~120W average, 24/7) | ~$12/month |</p><p>| Cloud Claude (10% of tasks) | ~$20/month (Pro subscription) |</p><p>| <strong>Total monthly ongoing</strong> | <strong>~$32/month</strong> |</p><h3>What I'd Pay With Cloud APIs</h3><p>| Workload | Estimated Monthly Cloud Cost |</p><p>|----------|----------------------------|</p><p>| 26 scheduled agent tasks | $800-1,200 |</p><p>| Content pipeline (ACA Council) | $400-600 |</p><p>| Ad-hoc development assistance | $200-400 |</p><p>| Research and analysis | $100-200 |</p><p>| <strong>Total estimated</strong> | <strong>$1,500-2,400/month</strong> |</p><h3>My Breakeven</h3><p>$7,000 &#247; $1,500/month = <strong>4.7 months</strong></p><p>I passed breakeven months ago. Every month now is pure savings.</p><h3>The Honest Caveats</h3><p>1. <strong>Cloud Claude is still better for some tasks.</strong> Complex architectural decisions, nuanced code review, novel problem-solving&#8212;I still reach for cloud Claude. Local models handle 90% of the volume but not 100% of the difficulty.</p><p>2. <strong>Setup time is real.</strong> I spent about 40 hours over several weeks getting the full automation stack running. That's an investment of time that wouldn't exist with cloud APIs.</p><p>3. <strong>Hardware depreciates.</strong> In 3-4 years, I'll want newer hardware. The Mac Studio will still work, but newer models will be faster and more capable. Budget for replacement cycles.</p><p>4. <strong>Not everyone needs this.</strong> If you make 20 AI calls a day interactively, a Claude Pro subscription ($20/month) is the right answer. Local infrastructure makes sense when you're automating at volume.</p><h2>Decision Framework</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CEXT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CEXT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CEXT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CEXT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CEXT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CEXT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CEXT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CEXT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CEXT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CEXT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7d99dc3-355c-41aa-9e62-a2c7390b32e7_2368x1138.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Choose Cloud When:</strong></h4><p><br>- You're just starting with AI (explore before investing)<br>- Your usage is primarily interactive (chatting, not automating)<br>- You need frontier model quality for every task<br>- You want zero hardware management<br>- Monthly API costs stay under $100</p><p></p><h4><strong>Choose Local When:</strong></h4><p><br>- You're automating workflows that run daily/hourly<br>- Privacy is a requirement (regulated industry, sensitive data)<br>- You'd spend $200+/month on cloud APIs<br>- You want unlimited experimentation without cost anxiety<br>- You're running multiple concurrent AI tasks</p><p></p><h4><strong>Choose Hybrid (Best for Most):</strong></h4><p><br>- Local models for volume tasks (triage, automation, content generation)<br>- Cloud models for high-value tasks (complex reasoning, frontier quality)<br>- Result: 90% of compute is free, 10% is cloud-quality</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>Can I start with a cheaper machine and upgrade later?</h3><p>Absolutely. A Mac Mini with 32 GB ($800) runs a solid automation stack. If you outgrow it, sell it (Macs hold resale value well) and upgrade. You don't need to start with the most expensive option.</p><h3>What about Linux with NVIDIA GPUs?</h3><p>Competitive for raw inference speed&#8212;an RTX 4090 (24 GB VRAM) is fast. But limited VRAM means you can only run one large model at a time. For multi-model architectures (triage + workhorse + specialist), Apple Silicon's unified memory is more flexible. Linux rigs are better for single-model, high-throughput workloads.</p><h3>Does model quality improve fast enough to justify local hardware?</h3><p>Yes. Open-source models improve dramatically every 6-12 months. A 26B model today outperforms a 70B model from two years ago. Your hardware runs better models over time without any additional cost&#8212;just download the new model.</p><h3>What if I already have a powerful gaming PC?</h3><p>If it has an NVIDIA GPU with 12+ GB VRAM, you can run local AI today at zero additional cost. Install Ollama, pull a model, and start experimenting. This is the cheapest possible entry point.</p><h3>Is the electricity cost significant?</h3><p>No. A Mac Studio draws about 40-120W depending on load. At US average electricity rates (~$0.15/kWh), that's $4-13/month running 24/7. An RTX 4090 draws more (300-450W under load) but idles much lower. Electricity is a rounding error compared to API costs.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/whats-the-roi-of-local-ai-infrastructure/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item><item><title><![CDATA[How Do I Build an AI Pipeline for Content Creation?]]></title><description><![CDATA[The Short Answer
An AI content pipeline takes raw ideas and produces finished, distributed content through automated stages.]]></description><link>https://astgl.com/p/how-do-i-build-an-ai-pipeline-for</link><guid isPermaLink="false">https://astgl.com/p/how-do-i-build-an-ai-pipeline-for</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 03:11:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Pa23!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Writing one blog post is straightforward. Turning that post into a newsletter, social threads, SEO metadata, and platform-specific variations&#8212;that's where the hours disappear.</p><p>A content pipeline automates the repetitive parts so you focus on ideas, not formatting. Here's how to build one, from simple to fully autonomous.</p><h2>The Short Answer</h2><p>An AI content pipeline takes raw ideas and produces finished, distributed content through automated stages. The simplest pipeline has 3 stages. A production pipeline has 7 or more.</p><p>| Pipeline Level | Stages | Human Involvement | Output |</p><p>|---------------|--------|-------------------|--------|</p><p>| <strong>Basic</strong> | Write &#8594; Format &#8594; Publish | High&#8212;you drive every step | 1 article |</p><p>| <strong>Intermediate</strong> | Research &#8594; Write &#8594; Repurpose &#8594; Publish | Medium&#8212;review at checkpoints | 1 article + 10 derivatives |</p><p>| <strong>Advanced</strong> | Discover &#8594; Research &#8594; Write &#8594; Edit &#8594; Fact-check &#8594; Repurpose &#8594; Publish | Low&#8212;review final output | 1 article + 25+ derivatives |</p><h2>Pipeline Architecture: The 7 Stages</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pa23!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pa23!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Pa23!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Pa23!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Pa23!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pa23!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pa23!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Pa23!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Pa23!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Pa23!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b8721c-d559-4523-b74e-366f9594e32a_2368x164.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every content pipeline, regardless of complexity, follows the same logical stages. You can automate as many or as few as you want.</p><h3>Stage 1: Discovery</h3><p><strong>What:</strong> Find topics worth writing about.</p><p><strong>How:</strong> AI monitors news feeds, social media trends, competitor content, and search queries in your niche. It surfaces topics with high interest and low competition.</p><p><strong>Manual version:</strong> You browse industry sites and note ideas.</p><p><strong>Automated version:</strong> A scheduled agent searches 10+ sources daily and delivers a ranked topic list every morning.</p><h3>Stage 2: Research</h3><p><strong>What:</strong> Gather information, data, and sources for the chosen topic.</p><p><strong>How:</strong> AI searches the web, reads relevant articles, extracts key facts, and compiles a structured research brief.</p><p><strong>Output:</strong> A 1-2 page brief with key points, statistics, source URLs, and suggested angles.</p><h3>Stage 3: Drafting</h3><p><strong>What:</strong> Write the first draft.</p><p><strong>How:</strong> AI takes the research brief, applies your voice profile and article template, and generates a full draft.</p><p><strong>Critical input:</strong> A voice profile&#8212;a document that describes your writing style, preferred vocabulary, sentence patterns, and tone. Without this, AI output sounds generic. With it, the draft sounds like you.</p><h3>Stage 4: Editing</h3><p><strong>What:</strong> Improve the draft's quality, flow, and accuracy.</p><p><strong>How:</strong> AI reviews for clarity, removes filler, tightens sentences, checks structure against the template, and verifies the voice matches your profile.</p><p>This is best done as a separate pass with a different prompt than the drafting stage. A fresh "editor" perspective catches issues the "writer" misses.</p><h3>Stage 5: Fact-Checking</h3><p><strong>What:</strong> Verify claims, statistics, and technical accuracy.</p><p><strong>How:</strong> AI identifies every factual claim in the article, searches for supporting sources, flags anything unverified, and provides confidence ratings.</p><p><strong>Two-phase approach:</strong></p><p>1. <strong>Extraction:</strong> Pull every claim from the article</p><p>2. <strong>Verification:</strong> Check each claim against web sources and known data</p><p>This step catches hallucinations before they reach your audience.</p><h3>Stage 6: Repurposing</h3><p><strong>What:</strong> Transform the article into platform-specific content pieces.</p><p><strong>How:</strong> AI reads the finished article and generates variations for every distribution channel.</p><p>| Derivative | Platform | Format |</p><p>|-----------|----------|--------|</p><p>| 5 social posts | LinkedIn, X, Facebook, Instagram, Threads | Platform-native length and style |</p><p>| 5 short-form notes | Substack Notes | Teaser + link |</p><p>| 1 newsletter intro | Email | Hook paragraph + article link |</p><p>| 1 SEO document | Search engines | Meta description, keywords, schema markup |</p><p>| 1 voiceover script | Podcast / video | Conversational spoken version |</p><p>| 5 graphic suggestions | Design tools | Visual concepts with text overlays |</p><p>| 3 pull quotes | Social graphics | Shareable quote cards |</p><p><strong>One article &#8594; 25+ content pieces.</strong> The marginal cost of each derivative is near zero.</p><h3>Stage 7: Publishing</h3><p><strong>What:</strong> Distribute content to all channels.</p><p><strong>How:</strong> Automated posting through APIs, MCP servers, or scheduling tools.</p><p><strong>Current best practice:</strong> Publish to your own site first (canonical URL), then syndicate to Substack, social platforms, and newsletters. This is POSSE&#8212;Publish Own Site, Syndicate Everywhere.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Building Your First Pipeline</h2><p>Start simple. You can always add stages.</p><h3>Level 1: Manual Pipeline (30 Minutes)</h3><p><strong>Tools needed:</strong> Claude with filesystem MCP server.</p><p>1. Write an article (or paste an existing one)</p><p>2. Ask Claude: "Read this article and generate 5 LinkedIn posts, 5 tweets, a newsletter intro, and SEO metadata."</p><p>3. Review and publish manually</p><p><strong>Time investment:</strong> 30 minutes per article cycle.</p><p><strong>Output:</strong> 1 article + ~15 derivatives.</p><h3>Level 2: Template Pipeline (15 Minutes)</h3><p><strong>Tools needed:</strong> Claude Code with custom commands.</p><p>Create a slash command (`.claude/commands/repurpose.md`) that contains your repurposing prompt with voice profile, platform specs, and output format. Then:</p><pre><code>/repurpose path/to/article.md</code></pre><p>One command generates all derivatives. You review and publish.</p><p><strong>Time investment:</strong> 15 minutes per article cycle.</p><p><strong>Output:</strong> 1 article + 25+ derivatives.</p><h3>Level 3: Automated Pipeline (5 Minutes of Review)</h3><p><strong>Tools needed:</strong> Local AI (Ollama), scheduling (OpenClaw/cron/n8n), MCP servers.</p><p>The pipeline runs on schedule:</p><p>1. Discovery agent finds a topic (or you assign one)</p><p>2. Research agent compiles a brief</p><p>3. Draft agent writes the article</p><p>4. Edit agent polishes it</p><p>5. Fact-check agent verifies claims</p><p>6. Repurpose agent generates derivatives</p><p>7. Draft appears in your review queue</p><p><strong>Your only job:</strong> Read the final output, approve or request changes, hit publish.</p><p><strong>Time investment:</strong> 5 minutes per article cycle (review only).</p><p><strong>Output:</strong> 1 article + 25+ derivatives, fully automated.</p><h2>Voice Profiles: The Secret to Human-Sounding AI Content</h2><p>The #1 differentiator between generic AI content and content that sounds like you is the voice profile.</p><h3>What a Voice Profile Contains</h3><pre><code>## Writing Voice: [Your Name]

### Tone
- Conversational but authoritative
- Direct &#8212; lead with the answer, not the preamble
- Practical over theoretical

### Sentence Structure
- Short sentences (10-15 words average)
- Active voice (95%+)
- Start paragraphs with statements, not questions

### Vocabulary
- Plain language &#8212; "use" not "utilize", "help" not "facilitate"
- Technical terms only when the audience expects them
- No corporate jargon: avoid "leverage", "synergy", "paradigm"

### Patterns to Follow
- Open with a hook that acknowledges the reader's problem
- Use tables for comparisons
- Include "How I Actually Do This" sections with real examples
- End FAQ sections with practical answers, not theory

### Patterns to Avoid
- Never use "In today's fast-paced world..."
- Never start with a definition from Wikipedia
- No filler paragraphs that don't add information
- Don't hedge excessively &#8212; state positions clearly</code></pre><h3>How to Build Your Voice Profile</h3><p>1. <strong>Gather 5-10 pieces of your best writing</strong>&#8212;articles, emails, presentations</p><p>2. <strong>Ask AI to analyze your voice:</strong> "Read these writing samples and describe my writing style&#8212;tone, sentence structure, vocabulary choices, recurring patterns."</p><p>3. <strong>Edit the analysis</strong>&#8212;AI will capture 80%, you refine the rest</p><p>4. <strong>Include it in every content prompt</strong>&#8212;either inline or as a system prompt</p><h2>How I Actually Do This</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CziU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CziU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CziU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CziU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CziU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CziU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CziU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CziU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CziU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CziU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d6c25-aecf-4215-9aab-11b29aa0fedb_2130x1088.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I run a multi-agent content pipeline called the ACA Council. Five specialized agents handle different stages:</p><h3>The ACA Council</h3><p>| Agent | Role | What It Does |</p><p>|-------|------|-------------|</p><p>| <strong>SCOUT</strong> | Discovery &amp; Research | Monitors 15+ sources, surfaces trending topics, compiles research briefs |</p><p>| <strong>FORGE</strong> | Outlining &amp; Structure | Takes research briefs and generates structured article outlines |</p><p>| <strong>QUILL</strong> | Drafting | Writes full articles following voice profile and article templates |</p><p>| <strong>LEDGER</strong> | Fact-Checking &amp; Validation | Two-phase verification of every claim, flags uncertainties |</p><p>| <strong>MAVEN</strong> | SEO &amp; Distribution | Generates all derivative content, optimizes for search and social |</p><h3>The Daily Schedule</h3><p>The Council meets twice daily:</p><ul><li><p><strong>Morning session (7 AM):</strong> SCOUT presents research, team prioritizes topics</p></li><li><p><strong>Evening session (8 PM):</strong> Review completed articles, queue for publishing</p></li></ul><h3>The 7-Step Build Pipeline</h3><p>For each article:</p><p>1. SCOUT delivers a research brief</p><p>2. FORGE generates the outline</p><p>3. QUILL writes the draft using voice profile</p><p>4. LEDGER fact-checks (two-phase: extract claims &#8594; verify sources)</p><p>5. Human review checkpoint (me&#8212;5 minutes)</p><p>6. MAVEN generates 25+ derivatives</p><p>7. Auto-publish to site, sync to Substack, queue social posts</p><h3>Results</h3><ul><li><p><strong>Output:</strong> 3-5 articles per week with full distribution packages</p></li><li><p><strong>My time per article:</strong> ~10 minutes (topic approval + final review)</p></li><li><p><strong>AI time per article:</strong> ~20 minutes of model compute</p></li><li><p><strong>Cloud API cost:</strong> $0/month&#8212;everything runs on local Ollama models</p></li><li><p><strong>Quality:</strong> Consistent voice, verified facts, platform-optimized distribution</p></li></ul><h2>Common Pipeline Mistakes</h2><p>| Mistake | Why It Fails | Fix |</p><p>|---------|-------------|-----|</p><p>| No voice profile | Content sounds generic and robotic | Build a voice profile from your existing writing |</p><p>| Skipping fact-check | AI hallucinates statistics and quotes | Always run a verification pass before publishing |</p><p>| One-shot generation | Single prompt produces mediocre results | Split into stages&#8212;research, draft, edit are separate steps |</p><p>| No human review | Errors slip through, voice drifts | Always scan final output before publishing |</p><p>| Over-automating too early | Complex pipeline before simple one is proven | Start manual, automate one stage at a time |</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-i-build-an-ai-pipeline-for?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-i-build-an-ai-pipeline-for?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/how-do-i-build-an-ai-pipeline-for?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>How long does it take to build a content pipeline?</h3><p>A basic manual pipeline (Level 1) takes 30 minutes to set up. A template pipeline (Level 2) takes 1-2 hours. A fully automated pipeline (Level 3) takes 1-2 weeks of iterating on prompts, voice profiles, and scheduling. Start at Level 1 and upgrade when you feel the friction.</p><h3>Will Google penalize AI-generated content?</h3><p>Google penalizes low-quality content, regardless of how it's created. AI content that's well-researched, accurate, and genuinely useful ranks well. The key factors: original insights (your "How I Actually Do This" sections), verified facts, and genuine expertise. A pipeline that produces generic AI slop will be penalized. One that produces expert-informed, fact-checked content won't.</p><h3>Can I use this for client work?</h3><p>Yes, with transparency. Many agencies use AI pipelines to increase throughput. The ethical approach: AI handles research, drafting, and repurposing; humans provide expertise, review, and final approval. Disclose AI assistance if your clients expect it.</p><h3>How do I handle topics the AI gets wrong?</h3><p>This is why the fact-checking stage exists. For technical topics, include authoritative sources in the research brief so the AI has correct information to work from. For evolving topics, always include a web search step to get current data. And always review&#8212;the pipeline produces drafts, not published pieces.</p><h3>What's the best AI model for content pipelines?</h3><p>For drafting: Gemma 4 26B or 31B (best voice quality at the local tier). For fact-checking: a model with web search access. For repurposing: Gemma 4 26B (handles format transformation well). For the full pipeline on a budget: a single Gemma 4 26B handles all stages adequately.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/how-do-i-build-an-ai-pipeline-for/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/how-do-i-build-an-ai-pipeline-for/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item><item><title><![CDATA[Can Small Businesses Benefit from MCP Servers?]]></title><description><![CDATA[Small businesses run on limited time and limited people. 
MCP servers change that math. They let AI handle the repetitive work using the tools you already have.]]></description><link>https://astgl.com/p/can-small-businesses-benefit-from</link><guid isPermaLink="false">https://astgl.com/p/can-small-businesses-benefit-from</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 03:02:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!52Vx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Small businesses run on limited time and limited people. Every hour spent on repetitive tasks&#8212;reformatting content, sorting email, prepping for meetings&#8212;is an hour not spent growing the business.</p><p>MCP servers change that math. They let AI handle the repetitive work using the tools you already have. No enterprise budget. No IT department. No coding.</p><h2>The Short Answer</h2><p>Yes. MCP servers give small businesses access to the same AI automation capabilities that enterprises are spending millions on&#8212;at a fraction of the cost. The ROI is immediate and measurable.</p><p>| Business Size | Without MCP | With MCP |</p><p>|--------------|-------------|----------|</p><p>| <strong>Solo operator</strong> | You do everything manually | AI handles content, email, research while you focus on clients |</p><p>| <strong>Small team (2-10)</strong> | Everyone wears multiple hats | AI automates shared workflows &#8212; briefings, reports, scheduling |</p><p>| <strong>Growing business (10-50)</strong> | Processes are inconsistent, tribal knowledge | AI standardizes workflows and makes knowledge accessible |</p><h2>The Five Highest-ROI Automations for Small Business</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!52Vx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!52Vx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 424w, https://substackcdn.com/image/fetch/$s_!52Vx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 848w, https://substackcdn.com/image/fetch/$s_!52Vx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!52Vx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!52Vx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!52Vx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 424w, https://substackcdn.com/image/fetch/$s_!52Vx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 848w, https://substackcdn.com/image/fetch/$s_!52Vx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!52Vx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19df8af4-f16d-4a2a-8e94-c23eb8791204_2368x372.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These five automations deliver the most time savings for the least setup effort. Start with one, prove the value, then add more.</p><h3>1. Content Repurposing (Save 2-4 Hours Per Article)</h3><p>You write one blog post. MCP-connected AI turns it into:</p><p>| Output | Platform | Time to Create Manually |</p><p>|--------|----------|------------------------|</p><p>| 5 social media posts | LinkedIn, X, Facebook, Instagram, Threads | 45 min |</p><p>| 1 newsletter intro | Email / Substack | 20 min |</p><p>| 1 SEO summary | Google / site meta | 15 min |</p><p>| 3 pull quotes | Social graphics | 15 min |</p><p>| 1 thread/carousel | X or LinkedIn | 30 min |</p><p>| 5 Substack Notes | Substack | 25 min |</p><p>| 1 voiceover script | Podcast / video | 20 min |</p><p>| 5 alt-angle headlines | A/B testing | 15 min |</p><p><strong>Total manual time: ~3 hours. With AI: ~15 minutes</strong> (review and approve drafts).</p><p>The AI reads your original article through the filesystem MCP server, generates all the variations, and saves them as files. You review, tweak if needed, and publish.</p><p><strong>One article &#8594; 25+ content pieces.</strong> For a small business publishing weekly, that's 12+ hours saved per month.</p><h3>2. Morning Briefing (Save 30 Minutes Per Day)</h3><p>Instead of checking email, calendar, tasks, and news manually each morning:</p><p><strong>Setup:</strong> Connect Calendar + Email + Task Manager + Web Search MCP servers.</p><p><strong>What AI delivers at 6:30 AM:</strong></p><ul><li><p>Today's meetings with prep notes</p></li><li><p>Important emails that need attention (prioritized)</p></li><li><p>Overdue tasks</p></li><li><p>Relevant industry news</p></li></ul><p><strong>Monthly time saved:</strong> ~10 hours</p><p>This one automation pays for a Claude Pro subscription ($20/month) in the first week.</p><h3>3. Email Triage and Draft Responses (Save 30-60 Minutes Per Day)</h3><p><strong>Setup:</strong> Connect Gmail MCP server.</p><p><strong>How it works:</strong></p><ul><li><p>AI reads incoming emails</p></li><li><p>Classifies by priority (urgent, important, routine, spam)</p></li><li><p>Drafts responses for routine emails</p></li><li><p>Flags emails that need your personal attention</p></li></ul><p>You review the drafts, hit send on the good ones, and personally handle only the ones that matter. Instead of reading 50 emails, you review 10 drafts and write 5 personal responses.</p><p><strong>Monthly time saved:</strong> 10-20 hours</p><h3>4. Meeting Prep and Follow-Up (Save 20-30 Minutes Per Meeting)</h3><p><strong>Setup:</strong> Connect Calendar + Email + CRM MCP servers.</p><p><strong>Before the meeting, AI generates:</strong></p><ul><li><p>Attendee background (from CRM and recent emails)</p></li><li><p>Last interaction summary</p></li><li><p>Suggested talking points</p></li><li><p>Relevant documents or proposals</p></li></ul><p><strong>After the meeting, AI generates:</strong></p><ul><li><p>Action item list</p></li><li><p>Follow-up email drafts</p></li><li><p>Updated CRM notes</p></li><li><p>Calendar entries for follow-ups</p></li></ul><p>For a business with 5 meetings per week, that's 8-12 hours saved per month.</p><h3>5. Competitive Research (Save 1-2 Hours Per Week)</h3><p><strong>Setup:</strong> Connect Web Search MCP server.</p><p><strong>Weekly automated research:</strong></p><ul><li><p>Competitor website changes</p></li><li><p>New product announcements in your space</p></li><li><p>Industry news and trends</p></li><li><p>Social media mentions of competitors</p></li></ul><p>AI summarizes everything into a structured brief. You read a 2-page summary instead of spending hours browsing websites and social feeds.</p><p><strong>Monthly time saved:</strong> 4-8 hours</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The ROI Math</h2><p>Let's make this concrete. Assume a small business owner's time is worth $75/hour (conservatively).</p><p>| Automation | Monthly Hours Saved | Monthly Value |</p><p>|-----------|-------------------|---------------|</p><p>| Content repurposing | 12 hours | $900 |</p><p>| Morning briefing | 10 hours | $750 |</p><p>| Email triage | 15 hours | $1,125 |</p><p>| Meeting prep/follow-up | 10 hours | $750 |</p><p>| Competitive research | 6 hours | $450 |</p><p>| <strong>Total</strong> | <strong>53 hours</strong> | <strong>$3,975</strong> |</p><p><strong>Monthly cost:</strong></p><ul><li><p>Claude Pro: $20/month</p></li><li><p>Or local models (Ollama on a Mac Mini): ~$600 one-time, $0/month ongoing</p></li></ul><p>Even with just one or two of these automations, the ROI is overwhelming.</p><h2>Getting Started: The 30-Minute Setup</h2><p>Here's the fastest path from zero to your first useful automation.</p><h3>Minute 0-5: Install Claude Desktop</h3><p>Download from <a href="https://claude.ai">claude.ai</a>, install, and log in. A Claude Pro subscription ($20/month) includes MCP server support.</p><h3>Minute 5-10: Install Node.js</h3><p>This is the one-time prerequisite. Download from <a href="https://nodejs.org">nodejs.org</a> (LTS version), install, done.</p><h3>Minute 10-20: Add Your First MCP Servers</h3><p>Open Claude Desktop settings and add two servers:</p><p><strong>Filesystem</strong> (access your business documents):</p><pre><code>{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem",
               "/path/to/your/business/documents"]
    }
  }
}</code></pre><p><strong>Web Search</strong> (research capabilities):</p><pre><code>{
  "mcpServers": {
    "filesystem": { "..." },
    "web-search": {
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-server-web-search"],
      "env": {
        "BRAVE_API_KEY": "your-free-api-key"
      }
    }
  }
}</code></pre><h3>Minute 20-30: Run Your First Automation</h3><p>Restart Claude Desktop. Then try:</p><blockquote><p>"Read my latest blog post from Documents/blog/ and create 5 LinkedIn posts, 3 tweet-length posts, and a newsletter introduction from it."</p></blockquote><p>Claude reads the file through the filesystem server, generates all the content variations, and presents them. Copy, review, publish.</p><p>You just saved 2 hours of work in 10 minutes.</p><h2>Scaling Up: Month-by-Month Plan</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nH7i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nH7i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nH7i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nH7i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nH7i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nH7i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nH7i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nH7i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nH7i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nH7i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28b4f337-74ab-4f19-adc1-162f22642b87_2072x700.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Month 1: Foundation</h3><ul><li><p>Set up filesystem + web search MCP servers</p></li><li><p>Use Claude for content repurposing (manual&#8212;you ask each time)</p></li><li><p>Track time saved vs. time spent</p></li></ul><h3>Month 2: Communication</h3><ul><li><p>Add Gmail MCP server</p></li><li><p>Start using email triage and draft responses</p></li><li><p>Add Google Calendar server for meeting prep</p></li></ul><h3>Month 3: Automation</h3><ul><li><p>Set up scheduled tasks (morning briefing via OpenClaw or n8n)</p></li><li><p>Create templates for recurring workflows</p></li><li><p>Add CRM or project management MCP server</p></li></ul><h3>Month 4+: Optimization</h3><ul><li><p>Refine prompts based on output quality</p></li><li><p>Add specialized servers for your industry</p></li><li><p>Consider local models (Ollama) to eliminate API costs</p></li></ul><h2>How I Actually Do This</h2><p>I use MCP servers to run a content-first small business. Here's the actual pipeline:</p><h3>Content Pipeline (1 Article &#8594; 25+ Pieces)</h3><p>My ACA Council&#8212;five specialized AI agents&#8212;handles the full content lifecycle:</p><p>1. <strong>SCOUT</strong> finds trending topics and research material</p><p>2. <strong>FORGE</strong> generates structured outlines</p><p>3. <strong>QUILL</strong> writes the draft</p><p>4. <strong>LEDGER</strong> fact-checks and validates claims</p><p>5. <strong>MAVEN</strong> optimizes for SEO and generates distribution pieces</p><p>One article generates: the full blog post, SEO metadata, social posts for 5 platforms, a newsletter version, Substack Notes, a voiceover script, and graphic suggestions. All automated.</p><h3>Morning Briefing (30 Minutes Saved Daily)</h3><p>OpenClaw runs a morning briefing at 6:30 AM using local models:</p><ul><li><p>Calendar summary (Google Calendar MCP)</p></li><li><p>Priority email scan (Gmail MCP)</p></li><li><p>Overnight log review (filesystem MCP)</p></li><li><p>Industry news (web search MCP)</p></li></ul><p>Delivered to Discord before I finish my coffee. I read a 2-page summary instead of spending 30 minutes checking 4 different apps.</p><h3>The Business Case</h3><p>My entire automation stack runs on local models&#8212;$0<strong>/month in API costs</strong>. The Mac Studio hardware was a significant upfront investment, but you don't need that to start. A Mac Mini with 32 GB ($800) runs a morning briefing and content repurposing pipeline comfortably. That investment pays for itself in the first month of time savings.</p><h2>Common Concerns</h2><h3>"What if the AI produces bad content?"</h3><p>It will, sometimes. That's why every automation includes a review step. AI generates drafts; you approve or edit. The time savings come from not starting from a blank page&#8212;editing a draft is always faster than writing from scratch.</p><h3>"I'm worried about data privacy."</h3><p>MCP servers run locally on your computer. Your business data doesn't leave your machine unless you explicitly connect to cloud services. For maximum privacy, use local models through Ollama&#8212;then nothing touches the internet at all.</p><h3>"My business is too niche for AI."</h3><p>AI doesn't need to understand your niche deeply to save you time. Email triage, meeting prep, and content formatting are universal tasks. The AI handles the structure; you provide the domain expertise. That combination is powerful regardless of industry.</p><h3>"I tried ChatGPT and it wasn't that helpful."</h3><p>ChatGPT without MCP servers is limited to what you paste into the chat. With MCP servers, AI accesses your actual data&#8212;files, emails, calendars, web. The difference is dramatic. It's the difference between describing your schedule to someone and handing them your calendar.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/can-small-businesses-benefit-from?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/can-small-businesses-benefit-from?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/can-small-businesses-benefit-from?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>Which industries benefit most from MCP automation?</h3><p>Any industry with significant knowledge work: professional services (law, accounting, consulting), real estate, marketing agencies, e-commerce, healthcare administration, education. The common thread is repetitive text-based tasks&#8212;email, documents, research, content.</p><h3>Can I use MCP servers if I'm not tech-savvy?</h3><p>Yes. The initial setup requires following step-by-step instructions (similar to installing any software). Once configured, you interact with AI using plain language. "Summarize my emails" is the interface&#8212;not code.</p><h3>How do MCP servers compare to Zapier or Make?</h3><p>Zapier and Make connect apps with if-then rules. MCP servers connect apps to AI with understanding. A Zapier automation moves data between apps in a fixed pattern. An MCP-connected AI reads, interprets, decides, and acts. They complement each other&#8212;use Zapier for simple triggers and MCP for intelligent processing.</p><h3>What's the minimum investment to get started?</h3><ul><li><p>Claude Pro subscription: $20/month</p></li><li><p>Hardware: Your existing computer</p></li><li><p>Time: 30 minutes for initial setup</p></li><li><p>Total first-month cost: $20</p></li></ul><p>Compare that to a virtual assistant ($500-2000/month) or a part-time employee ($1500+/month) for the same tasks.</p><h3>Can my team share MCP server setups?</h3><p>Yes. Configuration files can be shared and standardized across a team. Everyone gets the same MCP servers connected the same way, ensuring consistent AI capabilities across the business.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/can-small-businesses-benefit-from/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/can-small-businesses-benefit-from/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item><item><title><![CDATA[Can I Use MCP Servers Without Being a Developer?]]></title><description><![CDATA[The Short Answer

Yes, you can use MCP servers without being a developer.]]></description><link>https://astgl.com/p/can-i-use-mcp-servers-without-being</link><guid isPermaLink="false">https://astgl.com/p/can-i-use-mcp-servers-without-being</guid><dc:creator><![CDATA[James Cruce]]></dc:creator><pubDate>Mon, 13 Apr 2026 02:51:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bTZ3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You don't need to write code to use MCP servers. You don't need to understand APIs, protocols, or SDKs. If you can edit a text file and follow instructions, you can connect AI to your real tools in minutes.</p><p>Here's exactly how&#8212;three methods, zero coding required.</p><h2>The Short Answer</h2><p>Yes, you can use MCP servers without being a developer. The setup involves editing a configuration file or running a single command. No programming skills needed.</p><p>| Method | Difficulty | Time to Set Up | Best For |</p><p>|--------|-----------|----------------|----------|</p><p>| <strong>Claude Desktop</strong> | Copy-paste a config block | 5-10 minutes | Non-technical users who want a GUI |</p><p>| <strong>Claude Code CLI</strong> | Run one command per server | 2-3 minutes | Anyone comfortable with a terminal |</p><p>| <strong>VS Code</strong> | GUI settings panel | 5-10 minutes | People already using VS Code |</p><h2>Prerequisites (One-Time Setup)</h2><p>Before installing MCP servers, you need two things on your computer. If you already have them, skip ahead.</p><h3>Node.js</h3><p>Most MCP servers are built with Node.js. Install it once and forget about it.</p><p><strong>Mac:</strong></p><pre><code># Open Terminal and run:
brew install node</code></pre><p><strong>Windows:</strong></p><p>Download from <a href="https://nodejs.org">nodejs.org</a> and run the installer. Choose the LTS version.</p><p><strong>Verify it's installed:</strong></p><pre><code>node --version
# Should show something like: v22.x.x</code></pre><h3>Python (Optional)</h3><p>Some MCP servers use Python instead of Node.js. Install it if you encounter a Python-based server.</p><p><strong>Mac:</strong></p><pre><code>brew install python</code></pre><p><strong>Windows:</strong></p><p>Download from <a href="https://python.org">python.org</a>. Check "Add to PATH" during installation.</p><p>That's it for prerequisites. You won't need to write any JavaScript or Python&#8212;these are just runtimes that MCP servers need to execute.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">As The Geek Learns is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Method 1: Claude Desktop (GUI)</h2><p>Claude Desktop is the easiest starting point. You edit one configuration file, restart Claude, and your MCP servers appear as tools.</p><h3>Step 1: Find the Config File</h3><p>Open Claude Desktop, then:</p><ul><li><p><strong>Mac:</strong> Claude menu &#8594; Settings &#8594; Developer &#8594; Edit Config</p></li><li><p><strong>Windows:</strong> File &#8594; Settings &#8594; Developer &#8594; Edit Config</p></li></ul><p>This opens `claude_desktop_config.json` in your text editor.</p><h3>Step 2: Add an MCP Server</h3><p>The config file has a `mcpServers` section. Each server gets a block. Here's an example adding a web search server:</p><pre><code>{
  "mcpServers": {
    "web-search": {
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-server-web-search"],
      "env": {
        "BRAVE_API_KEY": "your-api-key-here"
      }
    }
  }
}</code></pre><p><strong>What each part means:</strong></p><ul><li><p>`"web-search"` &#8212; A name you choose (anything you want)</p></li><li><p>`"command"` &#8212; How to run the server (`npx` for Node.js servers)</p></li><li><p>`"args"` &#8212; The server package name</p></li><li><p>`"env"` &#8212; API keys or settings the server needs</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bTZ3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bTZ3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bTZ3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bTZ3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bTZ3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bTZ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bTZ3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bTZ3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bTZ3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bTZ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02012229-0076-4759-a5a1-b8bb9462c904_584x1880.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Step 3: Add Multiple Servers</h3><p>Stack them in the same config file:</p><pre><code>{
  "mcpServers": {
    "web-search": {
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-server-web-search"],
      "env": {
        "BRAVE_API_KEY": "your-key"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/you/Documents"]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "your-token"
      }
    }
  }
}</code></pre><h3>Step 4: Restart Claude Desktop</h3><p>Close and reopen Claude Desktop. You should see a hammer icon showing your connected tools. Click it to see which tools each server provides.</p><h3>Step 5: Use Them</h3><p>Just talk to Claude normally:</p><ul><li><p>"Search the web for MCP server tutorials"</p></li><li><p>"List the files in my Documents folder"</p></li><li><p>"Show me my recent GitHub pull requests"</p></li></ul><p>Claude automatically decides which MCP server tools to use based on your request.</p><h2>Method 2: Claude Code CLI (Fastest)</h2><p>If you use Claude Code (the terminal version), adding MCP servers is a single command.</p><h3>Add a Server</h3><pre><code>claude mcp add web-search -- npx -y @anthropic/mcp-server-web-search</code></pre><p>That's it. One command. The server is immediately available in your next Claude Code session.</p><h3>Add With Environment Variables</h3><pre><code>claude mcp add github -- npx -y @modelcontextprotocol/server-github \
  --env GITHUB_PERSONAL_ACCESS_TOKEN=your-token</code></pre><h3>List Connected Servers</h3><pre><code>claude mcp list</code></pre><h3>Remove a Server</h3><pre><code>claude mcp remove web-search</code></pre><h3>Scoping</h3><p>Claude Code lets you scope servers to specific projects:</p><pre><code># Available in all projects (global)
claude mcp add --scope global web-search -- npx -y @anthropic/mcp-server-web-search

# Available only in the current project
claude mcp add --scope project web-search -- npx -y @anthropic/mcp-server-web-search</code></pre><h2>Method 3: VS Code (GUI)</h2><p>If you use VS Code with the Claude extension, MCP servers can be configured through the settings GUI.</p><h3>Step 1: Open Settings</h3><p>`Cmd+,` (Mac) or `Ctrl+,` (Windows) &#8594; Search for "MCP"</p><h3>Step 2: Add Server Configuration</h3><p>VS Code's settings UI lets you add MCP server entries with fields for command, arguments, and environment variables. Same information as the JSON config, just in a form.</p><h3>Step 3: Reload Window</h3><p>`Cmd+Shift+P` &#8594; "Reload Window" to activate the new servers.</p><h2>Finding MCP Servers to Install</h2><h3>Public Registries</h3><p>| Registry | URL | Notes |</p><p>|----------|-----|-------|</p><p>| <strong>Smithery</strong> | smithery.ai | Largest registry, reviews, install commands |</p><p>| <strong>mcpt</strong> | mcpt.ai | Curated, quality-focused |</p><p>| <strong>OpenTools</strong> | opentools.ai | Growing collection, search by category |</p><p>| <strong>npm</strong> | npmjs.com (search "mcp-server") | Raw package listings |</p><h3>How to Browse</h3><p>1. Go to a registry (start with Smithery)</p><p>2. Browse categories: Productivity, Development, Data, Communication</p><p>3. Click a server to see: description, required config, install command</p><p>4. Copy the install command into your config file or CLI</p><h3>Popular Servers for Non-Developers</h3><p>| Server | What It Does | Config Complexity |</p><p>|--------|-------------|-------------------|</p><p>| <strong>Filesystem</strong> | Read/write files on your computer | Simple&#8212;just specify allowed directories |</p><p>| <strong>Web Search</strong> | Search the internet | Needs one API key (Brave) |</p><p>| <strong>Gmail</strong> | Read and draft emails | OAuth setup (guided) |</p><p>| <strong>Google Calendar</strong> | Check and create events | OAuth setup (guided) |</p><p>| <strong>Slack</strong> | Read and send messages | Bot token from Slack |</p><p>| <strong>Notion</strong> | Read and edit Notion pages | API key from Notion |</p><p>| <strong>GitHub</strong> | Manage repos, PRs, issues | Personal access token |</p><h2>Real Setup Walkthrough: 3 Servers in 15 Minutes</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gP8a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gP8a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gP8a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gP8a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gP8a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gP8a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gP8a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gP8a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gP8a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gP8a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5f2fb2f-e1a6-4595-8970-c1b155a67630_1668x556.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let's set up three useful MCP servers from scratch using Claude Code.</p><h3>Server 1: Filesystem (2 minutes)</h3><pre><code>claude mcp add filesystem -- npx -y @modelcontextprotocol/server-filesystem ~/Documents ~/Desktop</code></pre><p>Now Claude can read and write files in your Documents and Desktop folders. Try:</p><ul><li><p>"List the files on my Desktop"</p></li><li><p>"Read the contents of Documents/notes.txt"</p></li><li><p>"Create a file called todo.txt on my Desktop with today's tasks"</p></li></ul><h3>Server 2: Web Search (3 minutes)</h3><p>1. Get a free API key from <a href="https://brave.com/search/api/">brave.com/search/api</a></p><p>2. Run:</p><pre><code>claude mcp add web-search -- npx -y @anthropic/mcp-server-web-search \
  --env BRAVE_API_KEY=your-key-here</code></pre><p>Now Claude can search the internet. Try:</p><ul><li><p>"Search for the latest news about MCP servers"</p></li><li><p>"Find tutorials on Ollama setup"</p></li></ul><h3>Server 3: GitHub (5 minutes)</h3><p>1. Go to GitHub &#8594; Settings &#8594; Developer settings &#8594; Personal access tokens &#8594; Generate new token</p><p>2. Select scopes: `repo`, `read:org`</p><p>3. Run:</p><pre><code>claude mcp add github -- npx -y @modelcontextprotocol/server-github \
  --env GITHUB_PERSONAL_ACCESS_TOKEN=your-token</code></pre><p>Now Claude can interact with your GitHub repos. Try:</p><ul><li><p>"Show my open pull requests"</p></li><li><p>"List issues in my project repo"</p></li></ul><h2>How I Actually Do This</h2><p>I run about 15 MCP servers connected to Claude Code for daily work. Here's my philosophy:</p><h3>The Config File Approach</h3><p>For Claude Desktop, I keep a curated config file that I've refined over months. New servers get tested individually before joining the main config&#8212;one bad server config can prevent Claude Desktop from loading any of them.</p><h3>The CLI Approach</h3><p>For Claude Code, I use `claude mcp add` for project-specific servers and global scope for universal ones:</p><pre><code># Global &#8212; available everywhere
claude mcp add --scope global web-search -- npx -y @anthropic/mcp-server-web-search
claude mcp add --scope global filesystem -- npx -y @modelcontextprotocol/server-filesystem ~

# Project-specific &#8212; only in this repo
claude mcp add github -- npx -y @modelcontextprotocol/server-github</code></pre><h3>What I've Learned</h3><p>1. <strong>Start with 2-3 servers.</strong> File system + web search covers most needs. Add more only when you feel the gap.</p><p>2. <strong>Test one at a time.</strong> If you add 5 servers at once and something breaks, you won't know which one caused it. Add one, verify it works, then add the next.</p><p>3. <strong>Keep API keys in environment variables.</strong> Never paste keys directly in config files that might be synced or shared. Use `.env` files or your system's keychain.</p><p>4. <strong>Read the server docs.</strong> Each server has specific capabilities and limitations. A filesystem server configured with `~/Documents` can only acce<code>s Documents</code>it can't read your whole drive. That's a feature, not a bug.</p><p>5. <strong>The MCP ecosystem is growing fast.</strong> When I started, there were maybe 100 servers available. Now there are thousands. Check registries periodically&#8212;there might be a server for that tool you've been wishing Claude could access.</p><h2>Troubleshooting</h2><p>| Problem | Likely Cause | Fix |</p><p>|---------|-------------|-----|</p><p>| Server doesn't appear in Claude | Config syntax error | Validate your JSON at jsonlint.com |</p><p>| "Command not found" error | Node.js not installed | Install Node.js (see Prerequisites) |</p><p>| Server connects but tools don't work | Missing API key or wrong permissions | Check the server's documentation for required env variables |</p><p>| All servers stopped working | One server config is broken | Remove servers one at a time to find the broken one |</p><p>| Slow responses | Too many servers loaded | Remove servers you don't use regularly |</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/can-i-use-mcp-servers-without-being?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading As The Geek Learns! This post is public, so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/can-i-use-mcp-servers-without-being?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/can-i-use-mcp-servers-without-being?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2>Frequently Asked Questions</h2><h3>Is it safe to give MCP servers access to my files?</h3><p>MCP servers only access what you explicitly allow. A filesystem server configured with `~/Documents` cannot read your email, browser history, or other folders. Always scope access to the minimum needed directories.</p><h3>Do MCP servers send my data to the cloud?</h3><p>Not by default. MCP servers run locally on your machine. Data only leaves your computer if the server explicitly calls an external API (like web search or Gmail). Local-only servers like filesystem never send data anywhere.</p><h3>Can I use MCP servers on my phone?</h3><p>Not directly&#8212;MCP servers currently run on desktop/laptop computers. However, if you set up servers on a home computer, you can access them remotely through tools like Claude Code over SSH or a web-based interface like Open WebUI.</p><h3>What happens if an MCP server crashes?</h3><p>Claude continues working&#8212;it just loses access to that server's tools. Other servers remain connected. Restart the crashed server by restarting Claude Desktop or running `claude mcp restart` in Claude Code.</p><h3>Do I need to update MCP servers?</h3><p>Occasionally. Servers installed via `npx` automatically use the latest version. Servers installed globally with `npm install -g` need manual updates with `npm update -g`. Check for updates monthly.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://astgl.com/p/can-i-use-mcp-servers-without-being/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://astgl.com/p/can-i-use-mcp-servers-without-being/comments"><span>Leave a comment</span></a></p><p></p><p>*This is part of the <strong><a href="https://astgl.ai/answers/">ASTGL Definitive Answers</a></strong> series&#8212;structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.*</p>]]></content:encoded></item></channel></rss>