<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Digital Grove]]></title><description><![CDATA[Towards better computers.]]></description><link>https://www.dgtlgrove.com</link><image><url>https://substackcdn.com/image/fetch/$s_!9pMj!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5351bbc-244c-48a9-ab30-c510670e5788_256x256.png</url><title>Digital Grove</title><link>https://www.dgtlgrove.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 18 May 2026 16:41:25 GMT</lastBuildDate><atom:link href="https://www.dgtlgrove.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ryan Fleury]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[ryanfleury@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[ryanfleury@substack.com]]></itunes:email><itunes:name><![CDATA[Ryan Fleury]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ryan Fleury]]></itunes:author><googleplay:owner><![CDATA[ryanfleury@substack.com]]></googleplay:owner><googleplay:email><![CDATA[ryanfleury@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ryan Fleury]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Asynchronously Filling & Evicting Caches]]></title><description><![CDATA[An explanation from a stream on how caches in the RAD Debugger codebase are asynchronously filled and evicted.]]></description><link>https://www.dgtlgrove.com/p/asynchronously-filling-and-evicting</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/asynchronously-filling-and-evicting</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Thu, 30 Apr 2026 01:40:40 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/195942499/cf42c1b8eb611a58eae1aa314ea353c7.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>I was recently asked <a href="https://twitch.tv/ryanfleury">on stream</a> about base layer concepts in the <a href="https://github.com/EpicGamesExt/raddebugger">RAD Debugger codebase</a> which has been a useful building block for the many asynchronously-managed caches that the debugger needs for the purposes of evaluation and larger data visualization, where parsing or other preparation work for some data is long-running and unpredictable in the time i&#8230;</p>
      <p>
          <a href="https://www.dgtlgrove.com/p/asynchronously-filling-and-evicting">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Demystifying Debuggers, Part 5: Instruction-Level Stepping & Breakpoints]]></title><description><![CDATA[Unpacking how kernel and CPU debugger mechanisms can be used to implement instruction-level stepping and breakpoints.]]></description><link>https://www.dgtlgrove.com/p/demystifying-debuggers-part-5-instruction</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/demystifying-debuggers-part-5-instruction</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Sat, 21 Feb 2026 15:47:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6da72ee0-6b68-4262-9e8c-219caece330e_4096x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em><a href="https://www.dgtlgrove.com/i/188502778/demystifying-debuggers-series">Part 5 in a series.</a></em></p><p>Now that we&#8217;ve seen how a debugger modifies a debuggee (using the <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-3-kernel">kernel</a> and <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-4-cpu">CPU</a> features), let&#8217;s put these pieces together and see how we might implement both <em>breakpoints</em> and <em>instruction-level stepping</em> in a basic debugger.</p><div><hr></div><h3>The Control Loop</h3><p>Recall in <a href="https://www.dgtlgrove.com/i/153647066/a-simple-debugger-event-loop">Part 3</a> that we wrote a simple Windows &#8220;debugger&#8221;, which launched and attached to a process, and logged the debug events that the debugger received.</p><p>We launched and attached to the process through <code>CreateProcessA</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;7b5dc264-78dc-45ba-9981-fc4b380a1cbc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">  // launch process, attach
  char *cmd_line = arguments[1];
  STARTUPINFOA startup_info = {sizeof(startup_info)};
  PROCESS_INFORMATION process_info = {0};
  CreateProcessA(0, cmd_line, 0, 0, 0, DEBUG_PROCESS, 0, 0, &amp;startup_info, 
                 &amp;process_info);</code></pre></div><p>We gathered debug events using <code>WaitForDebugEvent</code> and <code>ContinueDebugEvent</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;a927e4c0-57fd-4fe0-bf49-b3d741c9f90f&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// gather events from debuggee
for(DEBUG_EVENT evt = {0};
    WaitForDebugEvent(&amp;evt, INFINITE);
    ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
{
  // use `evt`
  if(evt.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
  {
    break;
  }
}</code></pre></div><p>This &#8220;debugger&#8221; does nothing but follow along the debuggee as it executes. When it receives an event, it immediately resumes the debuggee, and doesn&#8217;t modify it at all. But an actually usable debugger, of course, allows the user to choose <em>when</em> the debuggee resumes, and what modifications to make before it does, through a user interface.</p><p>For the purposes of this post, those debuggee modifications will include <em>instruction-level stepping</em> and <em>instruction-level breakpoints</em>.</p><p>To support this, let&#8217;s enclose this <em>event gathering loop</em> with a <em>control loop</em>, which will receive commands to either place breakpoints, step threads, <em>or</em> to resume the debuggee.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;add94e79-3a3f-4553-8b95-8b9d550c323a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">for(B32 is_running = 1; is_running;)
{
  // read commands from user
  for(B32 need_commands = 1; need_commands;)
  {
    CommandKind kind = ...; // unpack the kind of command somehow
    switch(kind)
    {
      default:                        {...}break;
      case CommandKind_Resume:        {...}break;
      case CommandKind_Quit:          {...}break;
      case CommandKind_InstStep:      {...}break;
      case CommandKind_SetBreakpoint: {...}break;
    }
  }

  // gather events
  for(DEBUG_EVENT evt = {0};
      WaitForDebugEvent(&amp;evt, INFINITE);
      ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
  {
    // ...
  }
}</code></pre></div><p>First, for the basic commands, our work is simple.</p><p>For <code>CommandKind_Resume</code>, we simply set <code>need_commands</code> to <code>0</code>. Execution flows to the event loop, which resumes the debuggee until an event occurs which the debugger would need to know about.</p><p>For <code>CommandKind_Quit</code>, we simply set <code>need_commands</code> to <code>0</code>, then <code>is_running</code> to <code>0</code>, and then we kill the debuggee&#8212;for example, on Windows, using the <code>TerminateProcess</code> API:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;20e2088c-919b-4cae-9721-143dff17b72f&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">TerminateProcess(process_handle, 0);</code></pre></div><p>We also need to adjust our <em>debug event loop</em> to terminate (thus allowing for more command execution) under certain conditions. This is a debugger design decision, although many choices are obvious&#8212;for example, should the debuggee pause if the debugger is notified that a debuggee thread is created? Probably not. If the debuggee completes an instruction-level step, or hits a trap instruction? Then yes. If the debuggee encounters an exception? Then also yes.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;8b4923c2-23c7-4680-a56e-94c7ae0857c9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// gather events
for(DEBUG_EVENT evt = {0};
    WaitForDebugEvent(&amp;evt, INFINITE);
    ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
{
  B32 should_take_more_commands = 0;
  // ...
  // thread creation? should_take_more_commands -&gt; 0
  // hit exception? should_take_more_commands -&gt; 1
  // ...
  if(should_take_more_commands)
  {
    break;
  }
}</code></pre></div><p>Now, let&#8217;s consider how we might implement the instruction-level stepping and breakpoint commands.</p>
      <p>
          <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-5-instruction">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Demystifying Debuggers, Part 4: CPU Features & Debuggers]]></title><description><![CDATA[On CPU features that debuggers can use, like interruption instructions, debug registers, single-stepping mode, and more.]]></description><link>https://www.dgtlgrove.com/p/demystifying-debuggers-part-4-cpu</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/demystifying-debuggers-part-4-cpu</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Mon, 16 Feb 2026 21:10:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Rx7H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3159ac2-a916-41a4-a4aa-e03d831703b7_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em><a href="https://www.rfleury.com/i/146446067/demystifying-debuggers-series">Part 4 in a series.</a></em></p><p>I covered in <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-3-kernel">Part 3</a> that debuggers help analyze programs&#8212;&#8220;debuggees&#8221;&#8212;both by <em>reading from</em>, and <em>writing to, </em>them. The kernel exposes features which facilitate this bi-directional flow of information. A debugger can receive debug events, read memory, read registers, <em>and</em> write memory, and write registers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HRTM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HRTM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 424w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 848w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1272w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HRTM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png" width="1456" height="838" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:838,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HRTM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 424w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 848w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1272w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But what, exactly, does a debugger <em>write</em> into a debuggee&#8217;s registers or memory to modify it?</p><p>CPUs have a number of mechanisms&#8212;exposed by their instruction set architecture definitions&#8212;that are built for debugging. By modifying registers or memory, a debugger can make use of these mechanisms.</p><p>A debugger can use specific registers to set up the CPU such that it behaves differently when executing debuggee code. A debugger can also dynamically modify instruction memory, to dynamically adjust which instructions are executed by debuggee threads, at certain points in codepaths.</p><p>In this post, I&#8217;ll cover the basics of the following CPU mechanisms, and how they can be used by a debugger:</p><ul><li><p><em><strong>Instruction pointer register </strong></em>&#8212; A register which is used to store at which address the CPU will next execute an instruction.</p></li><li><p><em><strong>Interruption instructions</strong></em> &#8212; Instructions that can be written into an instruction stream, which cause the CPU core to interrupt execution of that instruction stream.</p></li><li><p><em><strong>Data breakpoint registers</strong></em> &#8212; Registers which are used to implement &#8220;data breakpoints&#8221;, where an exception will occur if some number of bytes at specific addresses are written to, read from, or executed.</p></li><li><p><em><strong>Single-stepping mode </strong></em>&#8212; A mode which causes the CPU core to immediately interrupt after executing a single instruction.</p></li><li><p><em><strong>Return instructions</strong> &#8212; </em>Instructions which return from a called procedure, replacing the instruction pointer with one stored on a thread&#8217;s call stack.</p></li></ul>
      <p>
          <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-4-cpu">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Magic of the Better Software Conference]]></title><description><![CDATA[Why BSC worked; why other conferences don't.]]></description><link>https://www.dgtlgrove.com/p/the-magic-of-the-better-software</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/the-magic-of-the-better-software</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Thu, 15 Jan 2026 23:49:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K19G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Death of Usenet</h2><p><a href="https://en.wikipedia.org/wiki/Usenet">Usenet</a> is a decentralized message board network created in 1980. Predating the World Wide Web&#8217;s ubiquity, Usenet was inhabited by a small and exclusive population from universities, laboratories, or other prestigious institutions. The general public originally didn&#8217;t know what it was, have the means to access it, or know why it mattered.</p><p>Even in simple text-based communication, humans build culture, norms, shared ideas, and etiquette. These are social technologies, developed to find and apply common purpose, maintain a standard of discussion quality, and share information efficiently. Usenet was no exception.</p><p>Throughout the &#8216;80s and &#8216;90s, at the beginning of each academic year, new university students would have their first opportunity to join Usenet. So, every September, existing Usenet users would notice an influx of &#8220;noobs&#8221;. </p><p>Noobs joined and disrupted culture. Training a noob to be familiar with a board&#8217;s shared ideas, etiquette, norms&#8212;and to be respectful of them&#8212;was not free. But given sufficient time, noobs would either assimilate or leave. Given the still-limited access&#8212;and given that it was limited to university students, an already exclusive (at the time) demographic&#8212;it was manageable to maintain the spirit of many boards, despite the yearly September influx.</p><p>This changed in 1993. <a href="http://www.catb.org/jargon/html/S/September-that-never-ended.html">Eternal September</a> began when easy Usenet access was first offered to the general public by Internet Service Providers. Usenet boards&#8212;originally frequented, again, by university students, professors, engineers, scientists&#8212;were flooded by a large and seemingly permanent wave of noobs, sourced from a much larger and less exclusive demographic.</p><p>The masses of noobs were so large, and so culturally far, and in many cases so cognitively far from those who originally frequented boards, that moderation&#8212;and thus the preservation of board culture&#8212;became non-viable. Old-timers became outnumbered by noobs&#8212;their boards were no longer theirs. What was once a decentralized network of communication hubs for small, elite circles became much more representative of modern Internet boards&#8212;what you might see if you have the misfortune of taking a <a href="https://reddit.com">wrong</a> <a href="https://news.ycombinator.com/">turn</a> while surfing the web.</p><p>I don&#8217;t remember Eternal September because I wasn&#8217;t born when it happened, let alone using a computer. But when I first learned about the shift it brought, I was intrigued, as I&#8217;d experienced <a href="https://www.rfleury.com/p/the-marketplace-of-ideals">something strikingly similar</a>.</p><p>Other examples of similar cultural disruption beyond the Internet are not scarce. The <a href="https://en.wikipedia.org/wiki/Game_Developers_Conference">CGDC</a> of the 1990s is not the same as today&#8217;s GDC. The <a href="https://en.wikipedia.org/wiki/Bell_Labs">Bell Labs</a> of the 1950s is not the same as today&#8217;s Nokia Bell Labs. The Apple of the 1990s is not the same as today&#8217;s Apple. The Microsoft of the 1990s is not the same as today&#8217;s Microsoft. The Google of the 1990s is not the same as today&#8217;s Google.</p><div><hr></div><h2>The Death of Google</h2><p>In each case, what began as a selective, elite, high quality, productive, and revolutionary organization, ultimately sacrificed health for growth. This is not to dismiss the still brilliant individuals within any given organization&#8212;I mean to merely describe patterns at the granularity of the organizations themselves.</p><p>A microcosm of the same phenomenon can be seen in the <a href="https://www.wheresyoured.at/the-men-who-killed-google/">conflict</a> between early Google programmer Ben Gomes&#8212;who joined Google in 1999, and worked as a pivotal contributor to the early success of Google&#8217;s flagship Search product&#8212;and others, notably Prabhakar Raghavan&#8212;the &#8220;noob&#8221; who joined Google in 2012.</p><p>There was a fierce conflict between Gomes and others like Raghavan over the decision to intentionally degrade Search quality in favor of serving more advertisements to users&#8212;or, in other words, what users didn&#8217;t ask for&#8212;to provoke a larger number of <em>queries</em>.</p><p>There is perhaps no clearer example of the rise&#8212;in Gomes&#8212;and fall&#8212;in Raghavan&#8212;of the Google &#8220;Don&#8217;t Be Evil&#8221; motto. Gomes seemed to truly believe in the original spirit of Google, and felt that it was being corrupted by a lust for <em>growth</em>:</p><blockquote><p><em>I've been thinking a bit about what Shashi says and I tend to agree that we are getting too close to the money.</em></p><p>[&#8230;]</p><p><em>I think it is good for us to aspire to query growth and to aspire to more users. But I think we are getting too involved with ads for the good of the product and company. We need to think of other issues like DuckDuckGo and the privacy challenge and our innovation narrative. We need to retain users for the long run.</em></p><p>[&#8230;]</p><p><em>I am getting concerned that growth is all we are thinking about.</em></p></blockquote><p>Raghavan, on the other hand, was a <em>noob</em>&#8212;far removed from the original mission and spirit of Google, and far too focused on second-order spreadsheet metrics like &#8220;number of queries&#8221;&#8212;too depraved and soulless to understand that such metrics are <em>downstream</em> from spirit.</p><div><hr></div><h2>People Are Not Fungible</h2><p>For any given example, it&#8217;s not difficult to find anecdotes from old-timers, where they nostalgically recall the old times, and lament what they feel was lost. The reason is clear: this is an undesirable phenomenon. More is not always better.</p><p>Individuals within larger and less exclusive populations have less in common with one another; there is less to bind them together. Reasons to form strong social bonds within the community have become <em>informal</em>, whereas in a highly selective group, the selection itself makes those reasons <em>formal</em>.</p><p>In the younger versions of these organizations, due to their more exclusive nature, meeting another person is refreshing. Upon meeting, you instantly know many deeply personal things you can bond over&#8212;shared passions, interests, hobbies, goals, philosophies. This makes friendships, partnerships, and information transfer both pleasant and efficient. In the aged versions of these organizations, one&#8217;s null hypothesis about commonalities between any two people change such that this is not true.</p><p>This phenomenon is driven by an organization&#8217;s age, and it seems to necessarily flow in one direction&#8212;towards the preference of growth, at the cost of exclusivity and cultural preservation. It&#8217;s unsurprising that companies, specifically, exhibit this effect&#8212;and ultimately regress towards the mean&#8212;because it reduces <em>risk</em>, and doesn&#8217;t immediately compromise profit. The longer a company is alive, the more incentive there will be to <em>stop betting</em> on the initial &#8220;magic&#8221; which made the company function to begin with (since it&#8217;ll inevitably die out regardless).</p><p>The more aged an organization, the more generic the bonds, the less exclusive the members, the larger the membership&#8212;and thus, a much higher social cost to change. It&#8217;s not popular to make an inclusive group exclusive. In many cases, it&#8217;s unclear how to do it at all. It destabilizes relationships, causes controversy, and poisons the air with conflict.</p><p>Put simply, in the long term, <em>the quality of an organization can only diminish with time</em>.</p><p>Therefore, the initial formulation of an organization&#8212;<em>who is in it</em>&#8212;is critical. Beyond that, the degree to which both exclusive selection and &#8220;noob training&#8221; continue to occur determines the organization&#8217;s age&#8212;or at least as long as it has until it deteriorates beyond recognition.</p><p>This is all for a simple reason: <em>people are not fungible</em>.</p><div><hr></div><h2>The Better Software Conference</h2><p>The first <a href="https://bettersoftwareconference.com/">Better Software Conference</a> took place in July 2025. I had the privilege of <a href="https://www.rfleury.com/p/cracking-the-code-realtime-debugger">giving a talk</a>.</p><p>I arrived in Stockholm the day before conferencegoers were to meet. This gave me a chance to briefly explore the beautiful, historic city.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-J9L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-J9L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-J9L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-J9L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-J9L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-J9L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg" width="592" height="444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:592,&quot;bytes&quot;:432273,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-J9L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-J9L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-J9L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-J9L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f6662-9e36-4844-96c4-fe8c26489cf3_2056x1542.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The next day, conferencegoers met up at the airport. We found ourselves on a <em>four hour </em>train ride to what seemed like the middle of Swedish nowhere. For many attendees including myself, this came after a long international flight to Sweden, so the extra travel might seem like it&#8217;d be exhausting.</p><p><em>Physically</em> I was tired, without a doubt. But <em>mentally</em>, <em>nobody</em> seemed to be exhausted. Despite being comprised almost entirely of strangers, the whole four hours were filled by fun and fascinating conversations&#8212;which didn&#8217;t subside until the conference ended almost a week later.</p><p>We arrived in a charming small town, and were led by one of the organizers on a short walk from the train to the <a href="https://twinpeaks.fandom.com/wiki/Great_Northern_Hotel?file=Great_Northern_Hotel_%28hallway%29.jpg">somewhat Twin-Peaks-like</a> hotel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b7PM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b7PM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 424w, https://substackcdn.com/image/fetch/$s_!b7PM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 848w, https://substackcdn.com/image/fetch/$s_!b7PM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!b7PM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b7PM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png" width="1456" height="983" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:983,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1062711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b7PM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 424w, https://substackcdn.com/image/fetch/$s_!b7PM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 848w, https://substackcdn.com/image/fetch/$s_!b7PM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!b7PM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faacd9602-cf4f-454d-b8ca-f85fb1831c12_1818x1228.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After getting settled in, the organizers had the hotel generously arrange free food and drink in the hotel&#8217;s conference room, and the aforementioned conversation resumed for hours.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wayd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wayd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 424w, https://substackcdn.com/image/fetch/$s_!Wayd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 848w, https://substackcdn.com/image/fetch/$s_!Wayd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!Wayd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wayd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png" width="1456" height="983" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:983,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:977967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wayd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 424w, https://substackcdn.com/image/fetch/$s_!Wayd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 848w, https://substackcdn.com/image/fetch/$s_!Wayd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!Wayd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bb89aa2-df67-4e9d-b12b-62fadf2b9a1e_1818x1228.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The organizers had worked with the town&#8217;s local government to arrange the conference, so the talks all took place in the town&#8217;s beautiful theater, which was a short walk from the hotel. Upon entering the theater on the first morning of talks, attendees were greeted with coffee, snacks, and <a href="https://youtu.be/mp5gksq_OEI">music</a> that lifted the spirits. It brilliantly created an atmosphere that supported continuing conversation and camaraderie.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DJpw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DJpw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DJpw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DJpw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DJpw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DJpw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5242706,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DJpw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DJpw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DJpw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DJpw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb44bed25-0ffa-492a-bff3-0b8cdf9ca05d_5712x4284.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><strong>Charlie Malmqvist, one of the BSC organizers, creator of <a href="https://www.youtube.com/watch?v=BOGxdUOQUus">NowGrep</a> </strong></figcaption></figure></div><p>Even before any talks had occurred, the conference had all of the properties I described of those fresh, exciting, and revolutionary circles. <em>Everyone</em> I spoke to was passionate about their craft. <em>Everyone</em> wanted to build something beautiful. <em>Everyone</em> loved what they did&#8212;at least, enough to fly to Sweden from all across the globe, take a four hour train ride to a rural town, and spend several days surrounded by a bunch of (mostly) strangers.</p><p>But nobody <em>felt </em>like strangers. The morning after arriving in the town, there was friendly Brazilian jiu-jitsu sparring, lake swims, and many-hour-long conversations.</p><p>This is partly because of Internet familiarity, but it was also because of a completely unique null hypothesis about shared passions, ideas, and knowledge. It was safe to assume that a &#8220;stranger&#8221; knew about&#8212;for instance&#8212;the early game engine work of John Carmack, or the game technology work at <a href="https://radgametools.com">RAD</a>, or Casey Muratori&#8217;s <a href="https://guide.handmadehero.org/">Handmade Hero</a>, or the work and streams of Jonathan Blow. And it was safe to assume that they had a passion about writing high-quality games, engines, tools, or other software largely from scratch, to push the boundaries of what is capable in software, and to produce results superior to those elsewhere in industry.</p><p>And because they had passion, they had spent thousands of hours of their own time practicing and refining their abilities. Many attendees were highly successful in shipping their own games or other software&#8212;turn around, and you might just <em>happen</em> to run into the creator of <a href="https://teardowngame.com/">Teardown</a>, or a programmer who works on the Jai compiler, or the creator of <a href="https://filepilot.tech">File Pilot</a>!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K19G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K19G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 424w, https://substackcdn.com/image/fetch/$s_!K19G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 848w, https://substackcdn.com/image/fetch/$s_!K19G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!K19G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K19G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png" width="1456" height="983" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:983,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:871147,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K19G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 424w, https://substackcdn.com/image/fetch/$s_!K19G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 848w, https://substackcdn.com/image/fetch/$s_!K19G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!K19G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42551616-c09d-46cf-82cc-102ae952977b_1818x1228.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From left to right: <a href="https://x.com/azmreece">Andrew Reece</a>, creator of <a href="https://whitebox.systems/">Whitebox</a>; <a href="https://x.com/LubaRaphael">Raphael Luba</a>, who works on the Jai compiler; <a href="https://x.com/vkrajacic">Vjekoslav Kraja&#269;i&#263;</a>, creator of <a href="https://filepilot.tech">File Pilot</a></figcaption></figure></div><p>This made communication pleasant, valuable for all parties, and highly efficient. The conference was socially <em>energizing</em>, rather than exhausting&#8212;even for a population as naturally introverted as programmers. It was unfortunate to go to bed each night, because you&#8217;d have to cut exhilarating conversations with great, talented, and personable friends short.</p><div><hr></div><h2>High Signal, Low Noise</h2><p>On top of the excellent atmosphere and practicalities, the talks at Better Software Conference were stellar. The signal-to-noise ratio was unlike almost any conference I&#8217;d seen (the only comparable equivalents were Casey&#8217;s HandmadeCon in <a href="https://www.youtube.com/playlist?list=PLEMXAbCVnmY5MtDW5Q0EuWmBW5kEMYhm1">2015</a> and <a href="https://www.youtube.com/playlist?list=PLEMXAbCVnmY6wYncIGmAXVC9lVgugFcr2">2016</a>).</p><p>The conference kicked off with a phenomenal <a href="https://www.computerenhance.com/p/the-big-oops-anatomy-of-a-thirty">talk from Casey himself</a>, breaking down the history of a nearly ubiquitous&#8212;perhaps mistakenly&#8212;paradigm in the software industry. The subsequent talks in the day featured insights from Dennis Gustafsson (creator of <a href="https://teardowngame.com/">Teardown</a>) on <a href="https://youtu.be/Kvsvd67XUKw">physics engine parallelization</a>, Bill Hall (<a href="https://www.gingerbill.org/">gingerBill</a>, creator of <a href="https://odin-lang.org/">Odin</a> and programmer at <a href="https://jangafx.com/">JangaFX</a>) on <a href="https://youtu.be/YNtoDGS4uak">learning mathematical tools</a>, and Vjekoslav Kraja&#269;i&#263; on <a href="https://youtu.be/bUOOaXf9qIM">the codebase of File Pilot</a>. This was <a href="https://bettersoftwareconference.com/">only the first day</a>.</p><p>The selection of speakers and the talk quality was no happy accident&#8212;the organizers didn&#8217;t simply <em>get lucky</em>. If your <em>speakers regress to the mean</em>, then your <em>content regresses to the mean</em>.</p><p>What do high quality speakers want? First, to be surrounded by other high quality people&#8212;or in other words, high quality attendees. Second, they want freedom. Third, they want a platform. The conference organizers provided all three. First, by carefully inviting attendees. Second, by keeping the conference schedule highly flexible, without any talk time limits or required topics. Third, by streaming all talks freely online to the public, with excellent professional audio and video.</p><p>Most importantly, because high quality speakers want an excellent in-person audience, and to be surrounded by others they&#8217;d be interested in speaking with, if your <em>attendees </em>regress to the mean, then your <em>speakers</em> regress to the mean.</p><div><hr></div><h2>Euphemisms For Power</h2><p>This invite-only structure is an unpopular way to execute a conference, here at the tail end of an era of &#8220;inclusivity&#8221;. Look no further than&#8212;and I&#8217;m sorry to do this&#8212;many online comments about the conference:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CFaG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CFaG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 424w, https://substackcdn.com/image/fetch/$s_!CFaG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 848w, https://substackcdn.com/image/fetch/$s_!CFaG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 1272w, https://substackcdn.com/image/fetch/$s_!CFaG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CFaG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png" width="1456" height="531" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:531,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:161062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CFaG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 424w, https://substackcdn.com/image/fetch/$s_!CFaG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 848w, https://substackcdn.com/image/fetch/$s_!CFaG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 1272w, https://substackcdn.com/image/fetch/$s_!CFaG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa59eb039-8952-4b4e-8d52-75b6ad94d36e_1513x552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c3pt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c3pt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 424w, https://substackcdn.com/image/fetch/$s_!c3pt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 848w, https://substackcdn.com/image/fetch/$s_!c3pt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 1272w, https://substackcdn.com/image/fetch/$s_!c3pt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c3pt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png" width="1456" height="326" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:326,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80433,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c3pt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 424w, https://substackcdn.com/image/fetch/$s_!c3pt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 848w, https://substackcdn.com/image/fetch/$s_!c3pt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 1272w, https://substackcdn.com/image/fetch/$s_!c3pt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1687d4c-d7a2-4fd2-988c-3ea232bd9d1c_1500x336.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ojwx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ojwx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 424w, https://substackcdn.com/image/fetch/$s_!ojwx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 848w, https://substackcdn.com/image/fetch/$s_!ojwx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 1272w, https://substackcdn.com/image/fetch/$s_!ojwx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ojwx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png" width="1128" height="242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e456cb88-8884-4ca1-9929-857825aebda5_1128x242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:242,&quot;width&quot;:1128,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172839878?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ojwx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 424w, https://substackcdn.com/image/fetch/$s_!ojwx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 848w, https://substackcdn.com/image/fetch/$s_!ojwx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 1272w, https://substackcdn.com/image/fetch/$s_!ojwx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe456cb88-8884-4ca1-9929-857825aebda5_1128x242.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This is a small selection, but plenty similar comments exist. Of course, the conference <em>was</em> invite-only, and thus &#8220;exclusive&#8221;, but it wasn&#8217;t selecting for particular demographics, it was selecting for&#8212;and <em>exclusively for&#8212;</em>people who <em>care about better software. </em>It&#8217;s in the title. It&#8217;s why the conference <em>wasn&#8217;t</em>&#8212;despite the characterization&#8212;demographically homogeneous. Unsurprisingly, those more concerned with demographics than <em>the topic of the conference </em>wouldn&#8217;t typically make the cut.</p><p>Such rants about &#8220;sameyness&#8221;, or the conference looking like a &#8220;frat party&#8221;, or &#8220;dudes duding it up dudely&#8221;, are resentful, deranged tantrums that the demographics of a conference&#8212;a computer programming conference&#8212;attended by primarily Swedes, other Europeans, and Americans&#8212;do not match those that they&#8217;d prefer, which suggests a rather nefarious undertone.</p><p>The stories I&#8217;ve told in this post&#8212;which are a small sample of the same pattern which has repeated time and time again&#8212;teach an important lesson: obsession with <em>growth </em>will kill the spirit of an organization. <em>Growth</em> can manifest in a number of ways, but most importantly, it can manifest <em>financially</em> and <em>influentially</em>. Increasing growth in one of those two ways requires <em>expanding</em> a particular population&#8212;a userbase, a conference&#8217;s attendees, a follower count&#8212;which necessarily requires appeal to a broader population. In other words, the <em>appeal</em> must become <em>less particular&#8212;more</em> <em>generic</em>&#8212;to <em>include a broader population</em>.</p><p>Is this not <em>inclusivity</em>?</p><p>And so it comes full circle&#8212;the executive boardroom meeting discussing the maximization of <em>growth</em>&#8212;and the Internet &#8220;discussion&#8221; board &#8220;discussing&#8221; the maximization of <em>inclusivity</em>&#8212;are, fundamentally, discussing their desire for exactly the same outcome. The characteristic difference is that the boardroom is discussing it from <em>inside</em> an organization, and the Internet board is discussing it from <em>outside</em> an organization&#8212;but <em>both</em> desire, ultimately, <em>growth</em>. The terms they use may differ, yes&#8212;both groups use the appropriate language to maximally signal virtue to those from whom they desire social approval&#8212;but in <em>substance, </em>it is the same.</p><p>Furthermore, to desire financial growth and influential growth is to desire <em>power</em>&#8212;to desire that <em>most of all</em> is to have <em>lust for power</em>.</p><p>But for someone who is primarily concerned with craft, or the creation of beauty, and would rather financial gain and influence come as <em>second order effects </em>of that, this <em>lust for power</em> is detrimental to their whole purpose. It kills their communities, their organizations, their companies, their teams, and thus detracts from their already-difficult work.</p><p>This is why careful selection of <em>people</em> is critical, and this is why it <em>works</em>. In many cases&#8212;for example, in that of Usenet&#8212;this selection is informal and organic. In other cases, like that of the invite-only Better Software Conference, it was formalized. Both work, though formalization can aid the longevity of an organization as it was intended.</p><div><hr></div><p>The reason I chose to write this post is that this conference was one of the best experiences of my life, but it was also an extremely <em>unique</em> experience. My experience is that the overwhelming number of companies, communities, conferences, and meetups do not fit this description. Most of them are Usenet&#8212;<em>after</em> Eternal September. I find that highly undesirable; the Better Software Conference provides a blueprint to build companies, communities, conferences, or other organizations which are highly fruitful and desirable.</p><p>The organizers of the Better Software Conference&#8212;beyond doing an excellent job planning practical necessities, like travel, lodging, and food&#8212;and doing a phenomenal job providing high quality audio &amp; video and an excellent presentation space for speakers&#8212;understood that <em>people are not fungible</em>. They understood that&#8212;to put on a great <em>conference</em>&#8212;you need great <em>people</em>, together, with a platform, space, and ample free time.</p><p>When put like that, it sounds simple. And yet almost nobody does it. After all, that&#8217;s the philosophy of <em>better software</em>: Do the <em>simple thing</em> that <em>solves the problem</em>, that nobody else is thinking about.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Desktop Abstraction and OS Design — Video Discussion w/ Sam Smith]]></title><description><![CDATA[A video call I had with Sam Smith&#8212;creator of the Serenum operating system&#8212;about fundamental concepts in operating system design pertaining to user-space applications and the desktop environment.]]></description><link>https://www.dgtlgrove.com/p/desktop-abstraction-and-os-design</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/desktop-abstraction-and-os-design</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Thu, 20 Nov 2025 07:13:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/BVDK3Cr3_IQ" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I recently had the pleasure of hopping on a call with <a href="http://samhsmith.com/">Sam Smith</a>, creator of the <a href="https://samhsmith.com/serenum/">Serenum operating system</a>, and one of the organizers of the <a href="http://bettersoftwareconference.com/">Better Software Conference</a>. We talked about operating system design&#8212;specifically, the boundary between operating systems and user-space programs. This was spurred on by a <a href="https://x.com/ryanjfleury/status/1957853922191638708">comment</a> I&#8217;d made on the subject, applying some of the lessons I&#8217;ve learned in designing the <a href="https://www.rfleury.com/p/cracking-the-code-realtime-debugger">RAD Debugger&#8217;s visualization engine</a>.</p><p>We decided to record the discussion, since we felt it might be fruitful for others to hear. I enjoyed the discussion, and I hope you do too.</p><div id="youtube2-BVDK3Cr3_IQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;BVDK3Cr3_IQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/BVDK3Cr3_IQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Multi-Core By Default]]></title><description><![CDATA[On multi-core programming, not as a special-case technique, but as a new dimension in all code.]]></description><link>https://www.dgtlgrove.com/p/multi-core-by-default</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/multi-core-by-default</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Fri, 10 Oct 2025 01:19:42 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/988a05e3-cb41-42db-94e4-37ec1362ba4f_4096x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qIQF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qIQF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!qIQF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!qIQF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!qIQF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qIQF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:294267,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dgtlgrove.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qIQF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!qIQF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!qIQF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!qIQF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4a1f85-f958-417d-9911-1768b55abfe2_4096x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Learning to program a single CPU core is difficult. There is an enormous number of techniques, amount of information, and number of hours to spend in order to learn to do it effectively. Learning to program <em>multiple CPU cores</em> to do work in parallel, all while these cores cooperate in accomplishing some overarching task, to me seemed like the anvil that broke the camel&#8217;s back&#8212;so to speak&#8212;there is already so much to wrangle when doing single-core programming, that for me, it was much more convenient to ignore multi-core programming for a long time.</p><p>But in the modern computer hardware era, there emerges an elephant in the room. With modern CPU core counts far exceeding 1&#8212;and instead reaching numbers like 8, 16, 32, 64&#8212;programmers leave an <em>enormous</em> amount of performance on the table by ignoring the fundamentally multi-core reality of their machines.</p><p>I&#8217;m not a &#8220;performance programmer&#8221;. <a href="https://www.youtube.com/watch?v=apREl0KmTdQ">Like Casey Muratori</a> (which is partly what made me follow him to begin with), I have always wanted <em>reasonable</em> performance (though this might <em>appear</em> like &#8220;performance programming&#8221; to a concerning proportion of the software industry), but historically I&#8217;ve worked in domains where I control the data involved, like my own games and engines, where I am either doing the art, design, and levels myself, or heavily involved in the process. Thus, I&#8217;ve often been able to use my own <em>programming constraints</em> to inform <em>artistic constraints</em>.</p><p>All of that went out the window over the past few years, when in my <a href="https://github.com/EpicGamesExt/raddebugger">work</a> on debuggers, I&#8217;ve needed to work with data which is not only <em>not under my control</em>, but is almost <em>exactly identical to the opposite of what I&#8217;d want&#8212;</em>it&#8217;s dramatically bigger, unfathomably poorly structured, extraordinarily complicated, and not to mention unpredictable and highly variable. This is because, as I&#8217;ve <a href="https://www.rfleury.com/p/demystifying-debuggers-part-1-a-busy">written about</a>, debuggers are at a &#8220;busy intersection&#8221;. They deal with unknowns from the external computing world on almost all fronts. And if one wanted a debugger to be useful for&#8212;for instance&#8212;extraordinarily large codebases that highly successful companies use to ship real things, those unknowns include unfortunate details about those codebases too.</p><p>As such, in my work, making more effective use of the hardware has been far more important than it ever has been for me in the past. As such, I was forced to address the &#8220;elephant in the room&#8221; that is CPU core counts, and actually doing multi-core programming.</p><p>I&#8217;ve learned a lot about the multi-core aspect of programming in the past few years, and I&#8217;ve written about lessons I&#8217;ve learned during that time, like those on <a href="https://www.rfleury.com/p/a-taxonomy-of-computation-shapes">basic mental building blocks I used to plan for multithreaded architecture</a>, and <a href="https://www.rfleury.com/p/multi-threading-and-mutation">carefully organizing mutations such that multiple threads require little-to-no synchronization</a>.</p><p>I still find those ideas useful, and my past writing still captures my thoughts on the <em>first principles</em> of multi-core programming. But recently, thanks to some lessons I learned after a few discussions with <a href="https://computerenhance.com">Casey</a>, my abilities in <em>concretely applying</em> those first principles have &#8220;leveled up&#8221;. I&#8217;m writing this post now to capture and share those lessons.</p><div><hr></div><h2>The Parallel <code>for</code> (And Its Flaws)</h2><p>Because every programmer learns single-core programming first, it&#8217;s common&#8212;after one first learns multi-core programming techniques&#8212;to apply those techniques conservatively within otherwise single-core code.</p><p>To make this more concrete, consider the following simple example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;61865a0b-4fc9-4ac2-842d-16030e016bc7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 *values = ...;
S64 values_count = ...;
S64 sum = 0;
for(S64 idx = 0; idx &lt; values_count; idx += 1)
{
  sum += values[idx];
}</code></pre></div><p>In this example, we compute a sum of all elements in the <code>values</code> array. Let&#8217;s now consider a few properties of sums:</p><ul><li><p><code>a + b + c + d = (a + b) + (c + d)</code></p></li><li><p><code>a + b + c + d = d + c + b + a</code></p></li><li><p><code>(a + b) + (c + d) = (c + d) + (a + b)</code></p></li></ul><p>Because we can reconsider a sum of elements as a sum of sums of groups of those elements, and because the order in which we sum elements does not impact the final computation, the original code can be rewritten like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;37f925eb-5cba-453f-9208-371c6881f8e7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 *values = ...;
S64 values_count = ...;

S64 sum0 = 0;
for(S64 idx = 0; idx &lt; values_count/4; idx += 1)
{
  sum0 += values[idx];
}

S64 sum1 = 0;
for(S64 idx = values_count/4; idx &lt; (2*values_count)/4; idx += 1)
{
  sum1 += values[idx];
}

S64 sum2 = 0;
for(S64 idx = (2*values_count)/4; idx &lt; (3*values_count)/4; idx += 1)
{
  sum2 += values[idx];
}

S64 sum3 = 0;
for(S64 idx = (3*values_count)/4; idx &lt; (4*values_count)/4 &amp;&amp; idx &lt; values_count; idx += 1)
{
  sum3 += values[idx];
}

S64 sum = sum0 + sum1 + sum2 + sum3;</code></pre></div><p>That obviously doesn&#8217;t win us anything&#8212;but what this means is that we can obtain the same result by subdividing the computation into several, smaller, independent computations.</p><p>Because several independent computations do not require writing to the same memory, they fit nicely with multi-core programming&#8212;each core does not need to synchronize at all with any other. This not only greatly simplifies the multi-core programming, but improves its performance&#8212;or, more precisely, it doesn&#8217;t <em>eat away </em>from the natural performance obtained by executing in parallel.</p><p>For cases like this, we can implement what&#8217;s known as a &#8220;<strong>parallel </strong><code>for&#8221;</code>. The idea is that we&#8217;d like to specify our original <code>for</code> loop&#8230;</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;630a4859-53de-4100-bb45-691e84262e02&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">for(S64 idx = 0; idx &lt; values_count; idx += 1)
{
  sum += values[idx];
}</code></pre></div><p>&#8230;but we&#8217;d like to also express that the loop can be subdivided into independent computations (the results of which we can join into a single result later).</p><p>In other words, we begin with normal, single-core code. But, for some computation, we want to &#8220;go wide&#8221;, and compute something in parallel. Then, we want to &#8220;join&#8221; this wide, parallel work, and go back to more single-core code, which can use the results of the work done in parallel.</p><p>This is a widely known and used concept. In many real codebases written in modern programming languages which offer many tools for abstraction building, you&#8217;ll find a number of <a href="https://learn.microsoft.com/en-us/cpp/parallel/concrt/reference/concurrency-namespace-functions?view=msvc-170#parallel_for">impressive gymnastics</a> to succinctly express this.</p><p>One of the reasons I prefer working in a simpler language is that, if what my code ultimately generates to facilitate some abstraction is complicated, that being reflected directly in the source code helps keep me honest about how &#8220;clean&#8221; some construct actually is.</p><p>If, on the other hand, some higher level utility can be provided by a simple and straightforward concrete implementation, that is a sign of a superior design&#8212;one that does not compromise on its implementation, but also does not compromise on its higher level utility.</p><p>Many people behave as though this is impossible&#8212;that higher level utility necessarily incurs substantial tradeoffs at the low level, or vice versa, that low level properties like performance necessitate undesirable high level design. This is simply not universally true. By hunting for tradeoffs, many programmers train themselves to ignore cases when they can both have, and eat, their cake.</p><p>So, if we consider our options for implementing a &#8220;parallel <code>for</code>&#8221; without a lot of modern language machinery, we might start with something like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;98785086-281e-4030-805e-d8c14dc7bfdf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">struct SumParams
{
  S64 *values;
  S64 count;
  S64 sum;
};

void SumTask(SumParams *p)
{
  for(S64 idx = 0; idx &lt; p-&gt;count; idx += 1)
  {
    p-&gt;sum += p-&gt;values[idx];
  }
}

S64 ComputeSum(S64 *values, S64 count)
{
  S64 count_per_core = count / NUMBER_OF_CORES;
  SumParams params[NUMBER_OF_CORES] = {0};
  Thread threads[NUMBER_OF_CORES] = {0};
  for(S64 core_idx = 0; core_idx &lt; NUMBER_OF_CORES; core_idx += 1)
  {
    params[core_idx].values = values + core_idx*count_per_core;
    params[core_idx].count = count_per_core;
    S64 overkill = ((core_idx+1)*count_per_core - count);
    if(overkill &gt; 0)
    {
      params[core_idx].count -= overkill;
    }
    threads[core_idx] = LaunchThread(SumTask, &amp;params[core_idx]);
  }

  S64 sum = 0;
  for(S64 core_idx = 0; core_idx &lt; NUMBER_OF_CORES; core_idx += 1)
  {
    JoinThread(threads[core_idx]);
    sum += params[core_idx].sum;
  }

  return sum;
}</code></pre></div><p>There are a number of unfortunate realities about this mechanism:</p><ol><li><p>In something like <code>LaunchThread</code> and <code>JoinThread</code>, we interact with the kernel to create and destroy kernel resources (threads) every time we perform a sum.</p></li><li><p>The actual case-specific code we needed (for the sum, in this case), and the number of particular details we had to specify and get right&#8212;like the work subdivision&#8212;has exploded. What used to be a simple <code>for</code> loop has been spread around to different, more intricate parts, all implementing different details of the mechanism we wanted&#8212;the work preparation, the work kickoff, and the joining and combination of work results. All parts must be maintained and changed together, every time we want a parallel <code>for</code>.</p></li><li><p>The solution&#8217;s control flow has been scattered across threads, CPU cores, and time. We can no longer trivially step through the sum in a debugger. If we encounter a bug in some iterations in a parallel <code>for</code>, we need to correlate the <em>launching</em> of that particular work, and that actual work. For example, if we stop the program in the debugger and find ourselves within a thread performing some iterations of the parallel <code>for</code>, we have lost context about <em>who</em> launched that work (in single-core code, this information is universally provided with call stacks).</p></li></ol><p>The first problem can be partly addressed with a new underlying layer which our code uses instead of the underlying kernel primitives. In many codebases, this layer is called a &#8220;job system&#8221;, or a &#8220;worker thread pool&#8221;. In those cases, the program prepares a set of threads <em>once</em>, and these threads simply wait for work, and execute it once they receive it:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;d85ca088-92c6-4c76-afb2-8556104b8a29&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">void JobThread(void *p)
{
  for(;;)
  {
    Job job = GetNextJob(...);
    job.Function(job.params);
  }
}

void SumJob(SumParams *p)
{
  ...
}

S64 ComputeSum(S64 *values, S64 count)
{
  Job jobs[NUMBER_OF_CORES] = {0};
  for(S64 core_idx = 0; core_idx &lt; NUMBER_OF_CORES; core_idx += 1)
  {
    ...
    jobs[core_idx] = LaunchJob(SumJob, &amp;params[core_idx]);
  }

  S64 sum = 0;
  for(S64 core_idx = 0; core_idx &lt; NUMBER_OF_CORES; core_idx += 1)
  {
    WaitForJob(jobs[core_idx]);
    sum += params[core_idx].sum;
  }

  return sum;
}</code></pre></div><p>In this case, there is still some overhead incurred by sending to and receiving information from the job threads, but it is significantly lighter than interacting with the kernel.</p><p>But it hasn&#8217;t improved the higher level code very much at all&#8212;we&#8217;ve simply replaced &#8220;threads&#8221; with &#8220;jobs&#8221;. The latter two problems hold. We still need to perform an entire dance in order to set up a &#8220;wide loop&#8221;&#8212;a &#8220;parallel <code>for</code>&#8221;, which scatters control flow for a problem across both source code, and coherent contexts (CPU cores, call stacks) at runtime.</p><p>In this concrete case&#8212;computing a sum in parallel&#8212;this is not a huge concern. Will it compute a sum in parallel? Yes. Does it have very few shared data writes? Yes. Can you parallelize all similarly parallelizable problems this way? Yes. But, we pay the costs of these problems every time we use this mechanism. If we have to pay that cost <em>very frequently</em> throughout a problem, it can become onerous to write, debug, and maintain all of this machinery.</p><div><hr></div><h2>The Job System (And Its Flaws)</h2><p>One desirable property of the parallel <code>for</code> is that all jobs&#8212;which execute at roughly the same time, across some number of cores&#8212;are identical in their &#8220;shape&#8221;. Each job thread participating in the problem is executing exactly the same <em>code</em>&#8212;we simply parameterize each job slightly differently, to distribute different subproblems to different cores. This makes understanding, predicting, profiling, and debugging such code much simpler.</p><p>Furthermore, within a parallel <code>for</code>, each job&#8217;s lifetime is scoped by the originating single-core code&#8217;s lifetime. Each job begins and ends within some scope&#8212;the scope responsible for launching, then joining, all of the jobs. This means no substantial lifetime management complexity occurs&#8212;allocations for a parallel <code>for</code> are as simple as for normal single-core code.</p><p>But in practice, the mechanism often used to implement parallel <code>for</code>s&#8212;the <em>job system</em>&#8212;is rarely <em>only</em> used in this way, which is understandable, given its highly generic structure. For example, it&#8217;s also often used to launch a number of <em>heterogeneous</em> jobs. In these cases, it becomes even more difficult to understand the context of a particular job&#8212;who launched it, and in what context? It also becomes more difficult to comprehensively understand a system&#8212;because there is such a large number of possible configurations of thread states, it can be difficult to ensure a threaded system is robust in all cases.</p><p>These jobs are also often not bounded by their launcher scope&#8212;as such, more engineering must be spent on managing resources, like memory allocations, whose lifetimes are now defined by what happens across multiple threads in multiple contexts.</p><p>And this is, really, the tip of the iceberg. In more sophisticated systems, one might observe that there are <em>dependencies</em> between jobs, and jobs ought to be implicitly launched when their dependency jobs complete, creating an even longer (and more difficult to inspect) chain of context related to some independent through line of work.</p><p>Ultimately, this presents recurring writing, reading, debugging, and maintenance costs that don&#8217;t exist in normal single-core code. All of the costs incurred by this job system design&#8212;whether used in a parallel <code>for</code> or otherwise&#8212;are paid <em>any time</em> new parallel work is introduced, or any time parallel work is maintained.</p><p>Now, <em>if</em> we have few parts of our code that <em>can</em> be parallelized in this way, then this is not a significant cost.</p><p>&#8230;But that <em>if</em> is doing a lot of heavy lifting.</p><p>In practice, I&#8217;ve found that an enormous number of systems are riddled with opportunities for parallelization, because of a lack of serial dependence between many of their parts. But, if taking advantage of every instance of serial independence requires significantly more engineering than just accepting single-core performance, then in many cases, programmers will opt for the latter.</p><p>Again&#8212;does this mean that a job system <em>cannot </em>be used to do such parallelization in these systems? No. But, it <em>also</em> means that we pay the costs of using this job system&#8212;the more moving parts; the extra code and concepts to write, read, and debug&#8212;much more frequently, if we&#8217;d like to take advantage of this widespread serial independence, or if we&#8217;d like any algorithm in particular to scale its performance with the number of cores.</p><div><hr></div><h2>Single-Core By Default</h2><p>The critical insight I learned from speaking with <a href="https://computerenhance.com">Casey</a> on this topic was that a significant reason why these costs arise is because of the careful organization a system needs in order to <em>switch</em> from single-core to multi-core code. Mechanisms like job systems and their special case usage in parallel <code>for</code>s represent, in some sense, the most conservative application of multi-core code. The vast majority of code is written as single-core, and a few carveouts are made when multi-core is critically important. In other words, code remains <em>single-core by default</em>, and in a few special cases, work is done to briefly hand work off to a multi-core system.</p><p>Because the context of code execution changes across time&#8212;because work is <em>handed off</em> from one system to another&#8212;it necessarily requires more code to set up, and it is more difficult to debug and understand the full context at any point in time.</p><p>But is this the best approach? Perhaps, instead of writing <em>single-core code</em> (which sometimes goes <em>wide</em>) by default, we can write <em>multi-core code</em> (which sometimes goes <em>narrow</em>) by default.</p><p>What does this look like in practice?</p><p>There&#8217;s a good chance that you&#8217;ve already experienced this style in other areas of programming&#8212;notably, in GPU shader programming.</p><p>GPU shaders&#8212;like vertex or pixel shaders, used in a traditional GPU rendering pipeline&#8212;are written such that they are <em>multi-core by default</em>. You author a single function (the entry point of the shader), but this function is executed on <em>many cores, always,</em> implicitly. The language constructs and rules are arranged in such a way that data reads and writes are always scoped by whatever core happens to be executing the code. A single execution of a <em>vertex</em> shader is scoped to a <em>vertex</em>&#8212;a <em>pixel </em>shader to a <em>pixel</em>&#8212;and so on.</p><p>Because the fundamental, underlying architecture is always <em>multi-core by default</em>, and because there is little involvement of each specific shader in how the multi-core parallelism is achieved, GPU programming enjoys enormous performance benefits, and yet as the shader programmer, it feels that there are <em>few costs</em> to pay for it. So few, in fact, that it feels more like artistic scripting, to the degree that <a href="https://www.shadertoy.com/about">someone</a> can build an entire website&#8212;<a href="https://www.shadertoy.com/">Shadertoy</a>&#8212;built around rapid-iteration, high-performance, visual GPU scripting.</p><p>Wait a minute&#8230; &#8220;high performance&#8221;, &#8220;rapid-iteration scripting&#8221;? It seems like many believe that these are mutually exclusive!</p><p>Why does CPU programming feel so different?</p><p>Contrast the GPU programming model to the usual CPU programming model&#8212;you author a single function (the entry point of your program), which is scheduled onto a <em>single core only</em>, normally by a kernel scheduler, using a <em>single</em> thread state. This model is, in contrast, <em>single-core by default</em>.</p><p>Long story short: it doesn&#8217;t have to be!</p><div><hr></div><h2>Multi-Core By Default</h2><p>Let&#8217;s begin by exactly inverting the approach. Instead of having a single thread which kicks off work to many threads, let&#8217;s just have many threads, all running the same code, by default. In a sense, let&#8217;s have <em>just one big parallel </em><code>for</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;f30285f5-6c83-48ea-8e1b-3c762d3c7553&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">void BootstrapEntryPoint(void)
{
  Thread threads[NUMBER_OF_CORES] = {0};
  for(S64 thread_idx = 0; thread_idx &lt; NUMBER_OF_CORES; thread_idx += 1)
  {
    threads[thread_idx] = LaunchThread(EntryPoint, (void *)thread_idx);
  }
  for(S64 thread_idx = 0; thread_idx &lt; NUMBER_OF_CORES; thread_idx += 1)
  {
    JoinThread(threads[thread_idx]);
  }
}

void EntryPoint(void *params)
{
  S64 thread_idx = (S64)params;
  // program's actual work occurs here!
}</code></pre></div><p>To click into an architecture which assumes a single-threaded entry point, we start with a <code>BootstrapEntryPoint</code>. But the only work this function actually does is launch all of the threads executing the <em>actual</em> entry point, <code>EntryPoint</code>.</p><p>Let&#8217;s consider the earlier summation example. First, let&#8217;s just take the original single-threaded code, and put it into <code>EntryPoint</code>, and see how we can continue from there.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;5852f2f5-bac7-4875-ae80-43bb6f45c534&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">void EntryPoint(void *params)
{
  S64 thread_idx = (S64)params;

  // we obtain these somehow:
  S64 *values = ...;
  S64 values_count = ...;

  // compute the sum
  S64 sum = 0;
  for(S64 idx = 0; idx &lt; values_count; idx += 1)
  {
    sum += values[idx];
  }
}</code></pre></div><p>What is actually happening? Well, we&#8217;re &#8220;computing the sum across many cores&#8221;. That is&#8230; <em>technically </em>true! Ship it!</p><p>There&#8217;s just one little problem&#8230; This is just as fast as the single-core version, except it also uses enormously more energy, and steals time from other tasks the CPU could be doing, because it is simply duplicating all work on each core.</p><p>But, if we were to measure this, and consider the real costs, and profile the actual code, the profile would look something like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FMe9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FMe9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 424w, https://substackcdn.com/image/fetch/$s_!FMe9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 848w, https://substackcdn.com/image/fetch/$s_!FMe9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 1272w, https://substackcdn.com/image/fetch/$s_!FMe9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FMe9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png" width="515" height="356.81626928471246" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:988,&quot;width&quot;:1426,&quot;resizeWidth&quot;:515,&quot;bytes&quot;:620805,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FMe9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 424w, https://substackcdn.com/image/fetch/$s_!FMe9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 848w, https://substackcdn.com/image/fetch/$s_!FMe9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 1272w, https://substackcdn.com/image/fetch/$s_!FMe9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15a04d3-a7c7-4a12-8ba9-862b932dd95d_1426x988.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Duplication itself is not, in principle, a problem, and it is sometimes not to be avoided, because <em>deduplication</em> can sometimes be more expensive than <em>duplication</em>. For instance, communicating the result of a single <code>add</code> instruction across many threads&#8212;to deduplicate the work of that <code>add</code>&#8212;would be vastly more expensive than simply duplicating the <code>add</code> itself. We <em>do</em> want deduplication, but only when necessary, or when it actually helps.</p><p>So, where does it help? Unsurprisingly in this case, the dominating cost&#8212;the reason we are using multiple cores <em>at all</em>&#8212;is the sum across all elements in <code>values</code>. We want to distribute different parts of the sum across cores. To start, instead of computing <em>the full sum</em>, we can instead compute a <em>per-thread</em> <em>sum</em>. After each <em>per-thread sum</em> is computed, we can then combine them:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;5eb893c2-f6d0-4bc2-95f6-319987c2fbb9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">void EntryPoint(void *params)
{
  S64 thread_idx = (S64)params;

  // we obtain these somehow:
  S64 *values = ...;
  S64 values_count = ...;

  // decide this thread's subset of the sum
  S64 thread_first_value_idx = ???;
  S64 thread_opl_value_idx = ???; // one past last

  // compute the thread sum
  S64 thread_sum = 0;
  for(S64 idx = thread_first_value_idx;
      idx &lt; thread_opl_value_idx;
      idx += 1)
  {
    thread_sum += values[idx];
  }

  // combine the thread sums
  S64 sum = ???;
}</code></pre></div><p>We have two blanks to fill in:</p><ol><li><p>How do we decide each thread&#8217;s subset of work?</p></li><li><p>How do we combine all thread sums?</p></li></ol><p>Let&#8217;s tackle each.</p><h3>1. Deciding Per-Thread Work</h3><p>Currently, the only input I&#8217;ve provided each thread is its <em>index</em>, which would be in <em>[0, N)</em>, where <em>N</em> is the number of threads. This is stored in the local variable <code>thread_idx</code>, which will have a different value in <em>[0, N)</em> for each thread. This is an easy example, because a good way to distribute the sum work across all threads is to uniformly distribute the number of values to sum amongst the threads. This means we are simply mapping <em>[0, M) </em>to <em>[0, N)</em>, where <em>M</em> is the number of values&#8212;<code>values_count</code>&#8212;and <em>N</em> is the number of threads.</p><p>We can <em>almost</em> compute this as follows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;9b854ab7-a5b7-4e40-9e77-4351f4e3995d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 values_count = ...;
S64 thread_idx = ...;
S64 thread_count = NUMBER_OF_CORES;

S64 values_per_thread = values_count / thread_count;
S64 thread_first_value_idx = values_per_thread * thread_idx;
S64 thread_opl_value_idx = thread_first_value_idx + values_per_thread;</code></pre></div><p>This is almost right, but only almost, because we also need to account for the case where <code>values_count</code> is not cleanly subdivided by <code>thread_count</code>. Because our <code>values_per_thread</code> will truncate to the next lowest integer, this current distribution will <em>underestimate</em> the number of values we need to compute per thread, by anywhere from 0 (if it divides cleanly) to <code>thread_count-1</code> values&#8212;or in other words, the remainder of the division.</p><p>Thus, the number of values this division underestimates by&#8212;the &#8220;leftovers&#8221;&#8212;can be computed as follows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;4104bc25-f92f-4bec-bf6a-34be81195e54&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 leftover_values_count = values_count % thread_count;</code></pre></div><p>We can then distribute these leftovers amongst the first <code>leftover_values_count</code> threads:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;f74af20c-4824-4359-8e3b-0c3564ac45b2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// compute the values-per-thread, &amp; number of leftovers
S64 values_per_thread = values_count / thread_count;
S64 leftover_values_count = values_count % thread_count;

// determine if the current thread gets a leftover
// (we distribute them amongst the first threads in the group)
B32 thread_has_leftover = (thread_idx &lt; leftover_values_count);

// decide on how many leftovers have been distributed before this
// thread's range (just the thread index, clamped by the number of
// leftovers)
S64 leftovers_before_this_thread_idx = 0;
if(thread_has_leftover)
{
  leftovers_before_this_thread_idx = thread_idx;
}
else
{
  leftovers_before_this_thread_idx = leftover_values_count;
}

// decide on the [first, opl) range:
// we shift `first` by the number of leftovers we've placed earlier,
// and we shift `opl` by 1 if we have a leftover.
S64 thread_first_value_idx = (values_per_thread * thread_idx +
                              leftovers_before_this_thread_idx);
S64 thread_opl_value_idx = thread_first_value_idx + values_per_thread;
if(thread_has_leftover)
{
  thread_opl_value_idx += 1;
}</code></pre></div><p>Or more succinctly:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;6dbcdf6d-90a9-488a-ae73-1535078ba713&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 values_per_thread = values_count / thread_count;
S64 leftover_values_count = values_count % thread_count;
B32 thread_has_leftover = (thread_idx &lt; leftover_values_count);
S64 leftovers_before_this_thread_idx = (thread_has_leftover
                                        ? thread_idx
                                        : leftover_values_count);
S64 thread_first_value_idx = (values_per_thread * thread_idx +
                              leftovers_before_this_thread_idx);
S64 thread_opl_value_idx = (thread_first_value_idx + values_per_thread + 
                            !!thread_has_leftover);</code></pre></div><p>Now, using this <code>[first, opl)</code> calculation, we can arrange each thread to only loop over its associated range, thus not duplicating all sum work done by other threads.</p><h3>2. Combining All Thread Sums</h3><p>Now, how might we combine each thread&#8217;s sum to form the total sum? There are two simple options available: <strong>(a)</strong> we can define a global sum counter to which each thread atomically adds (using atomic add intrinsics) its per-thread sum, or <strong>(b) </strong>we can define global storage which stores <em>all</em> thread sums, and each thread can duplicate the work of computing the total sum.</p><p>For <strong>(a)</strong>, we just need to define <code>sum</code> as <code>static</code>, and atomically add each <code>thread_sum</code> to it:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;70703734-d947-4428-ba4c-8621a6a1fa19&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">static S64 sum = 0;

void EntryPoint(void *params)
{
  // ...
  // compute `thread_sum`
  // ...
  AtomicAddEval64(&amp;sum, thread_sum);
}</code></pre></div><p><em><strong>Note:</strong></em> <em>This has a downside in that only one thread group can be executing this codepath at once. This is sometimes not a practical concern, since if we are going wide at all, we are often using all available cores to do so, and it is likely not beneficial to also have some other thread group executing the same codepath for a different purpose. That said, it&#8217;s now a new hidden restriction of this code, and it can be a critical problem. There are some techniques we can use to solve this problem, which I will cover later&#8212;for now, the important concept is that the data is shared across participating threads.</em></p><p>For <strong>(b)</strong>, we&#8217;d instead have a global table, and duplicate the work of summing across all thread sums. But we can only do that <em>after</em> we know that each thread has completed its summation work&#8212;otherwise we&#8217;d potentially add some other thread&#8217;s sum before it was actually computed!</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;fc28f7a5-4383-4452-aed6-ae937fa02035&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">static S64 thread_sums[NUMBER_OF_CORES] = {0};

void EntryPoint(void *params)
{
  // ...
  // compute `thread_sum`
  // ...
  thread_sums[thread_idx] = thread_sum;

  // ??? need to wait here for all threads to finish!

  S64 sum = 0;
  for(S64 t_idx = 0; t_idx &lt; NUMBER_OF_CORES; t_idx += 1)
  {
    sum += thread_sums[t_idx];
  }
}</code></pre></div><p>That extra waiting requirement might seem like an argument in favor of <strong>(a)</strong>, but we&#8217;d actually need the same mechanism if we did <strong>(a)</strong> once we wanted to actually <em>use</em> the sum&#8212;we&#8217;d need to wait for all threads to reach some point, so that we&#8217;d know that they&#8217;d all atomically updated <code>sum</code>.</p><p>We can use a <em><a href="https://en.wikipedia.org/wiki/Barrier_(computer_science)">barrier</a></em> to do this. In <strong>(a)</strong>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;e8237634-84bb-439f-979a-ff7a17bea285&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">static S64 sum = 0;
static Barrier barrier = {0};

void EntryPoint(void *params)
{
  // ...
  // compute `thread_sum`
  // ...
  AtomicAddEval64(&amp;sum, thread_sum);
  BarrierSync(barrier);
  // `sum` is now fully computed!
}</code></pre></div><p>And in <strong>(b)</strong>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;39e4ce6a-6bda-4462-a4fe-02aa84bd8fef&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">static S64 thread_sums[NUMBER_OF_CORES] = {0};
static Barrier barrier = {0};

void EntryPoint(void *params)
{
  // ...
  // compute `thread_sum`
  // ...
  thread_sums[thread_idx] = thread_sum;

  BarrierSync(barrier);

  S64 sum = 0;
  for(S64 t_idx = 0; t_idx &lt; NUMBER_OF_CORES; t_idx += 1)
  {
    sum += thread_sums[t_idx];
  }
  // `sum` is now fully computed!
}</code></pre></div><p>At this point, we have everything we need for both <strong>(a)</strong> and <strong>(b)</strong>. Both are simple, and likely negligibly different. <strong>(a)</strong> requires atomic summation across all the threads, which implies hardware-level synchronization, whereas <strong>(b)</strong> duplicates the sum of all per-thread sums&#8212;these likely subtly differ in their costs, but not by much when compared to the actual <code>values</code> summation.</p><div><hr></div><h2>Going Narrow</h2><p>Now, while I hope this summation example has been a useful introduction, I know it&#8217;s a bit contrived, and incomplete. Specifically, it&#8217;s missing two key parts of any program: <em>inputs</em> and <em>outputs</em>. What are we <em>doing with this sum</em>, and how do we use that in producing some form of output, and how do obtain the inputs, and store them in <code>values</code> and <code>values_count</code>?</p><p>Let&#8217;s barely extend the summation example with stories for the inputs and outputs. For the inputs, let&#8217;s say that we read <code>values</code> out of a binary file, which just contains the whole array stored as it will be in memory. For the outputs, let&#8217;s say that we just print the sum to <code>stdout</code> with <code>printf</code>.</p><p>Printing out the sum will be the easiest part, so let&#8217;s begin with that.</p><p>In single-core code, after computing the sum, we&#8217;d simply call <code>printf</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;007425bc-db56-41db-a56c-33d0932efbe6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 sum = ...;
// ...
printf("Sum: %I64d", sum);</code></pre></div><p>We can start by just doing the same in our &#8220;multi-core by default&#8221; code. What we&#8217;ll find is that our output looks something like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;545716cf-a7e9-44bd-b246-abd3a42ad599&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Sum: 12345678
Sum: 12345678
Sum: 12345678
Sum: 12345678
Sum: 12345678
Sum: 12345678
Sum: 12345678
Sum: 12345678</code></pre></div><p>And obviously, we only want our many cores to be involved with the majority of the computation, but we only need one thread to do the actual <code>printf</code>. In other words, we need to <em>go narrow</em>. Luckily, going <em>narrow</em> from <em>wide code</em> is much simpler than going <em>wide</em> from <em>narrow code</em>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;9e4a05ef-61e5-4c95-8b5f-09383074ac0c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 sum = ...;
// ...
if(thread_idx == 0)
{
  printf("Sum: %I64d", sum);
}</code></pre></div><p>We simply need to mask away the work from all threads except one.</p><p>Now, let&#8217;s consider the <em>input</em> problem. We need to compute <code>values_count</code> based on the size of some input file, allocate storage for <code>values</code>, and then fill <code>values</code> by reading all data from the file.</p><p>Single-threaded code to do that might look something like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;91ffc085-2ed6-476a-a08b-80f65bd4ca3b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">char *input_path = ...;
File file = FileOpen(input_path);
S64 size = SizeFromFile(file);
S64 values_count = (size / sizeof(S64));
S64 *values = (S64 *)Allocate(values_count * sizeof(values[0]));
FileRead(file, 0, values_count * sizeof(values[0]), values);
FileClose(file);</code></pre></div><p>So, naturally, one option is to simply do this <em>narrow</em>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;aa311259-e5c1-4cdf-8cd9-711923705f77&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">if(thread_idx == 0)
{
  char *input_path = ...;
  File file = FileOpen(input_path);
  S64 size = SizeFromFile(file);
  S64 values_count = (size / sizeof(S64));
  S64 *values = (S64 *)Allocate(values_count * sizeof(values[0]));
  FileRead(file, 0, values_count * sizeof(values[0]), values);
  FileClose(file);
}
BarrierSync(barrier); // `values` and `values_count` ready after this point</code></pre></div><p>This will work, but we somehow need to broadcast the computed values of <code>values</code> and <code>values_count</code> across all threads. One easy way to do this is simply to pull them out as <code>static</code>, like we did for shared data earlier:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;6c16c005-2d9a-477c-a714-bd1aa8fc3d4d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">static S64 values_count = 0;
static S64 *values = 0;
if(thread_idx == 0)
{
  char *input_path = ...;
  File file = FileOpen(input_path);
  S64 size = SizeFromFile(file);
  values_count = (size / sizeof(S64));
  values = (S64 *)Allocate(values_count * sizeof(values[0]));
  FileRead(file, 0, values_count * sizeof(values[0]), values);
  FileClose(file);
}
BarrierSync(barrier);</code></pre></div><p>But consider that we <em>might not </em>want to do this <em>completely </em>single-core. It might be the case that it&#8217;s more efficient to issue <code>FileRead</code>s from many threads, rather than just one. In practice, this is partly true (although, depending on the full stack&#8212;the kernel, the storage drive hardware, and so on&#8212;it may not be beneficial past some number of threads, and for certain read sizes).</p><p>So let&#8217;s say we&#8217;d like to do the <code>FileRead</code>s wide now also. We need to still allocate <code>values</code> on a single thread, but once that is done, we can distribute the rest of the work trivially:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;e7d7320c-2a37-4a63-99d5-a53de0e2b01b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// we can open the file on all threads (though for some reasons
// we may want to deduplicate this too - for simplicity I am
// keeping it on all threads)
File file = FileOpen(input_path);

// calculate number of values and allocate (only single thread)
static S64 values_count = 0;
static S64 *values = 0;
if(thread_idx == 0)
{
  S64 size = SizeFromFile(file);
  values_count = (size / sizeof(S64));
  values = (S64 *)Allocate(values_count * sizeof(values[0]));
}
BarrierSync(barrier);

// compute thread's range of values (same calculation as before)
S64 thread_first_value_idx = ...;
S64 thread_opl_value_idx = ...;

// do read of this thread's portion
S64 num_values_this_thread = (thread_opl_value_idx - thread_first_value_idx);
FileRead(file,
         thread_first_value_idx*sizeof(values[0]),
         num_values_this_thread*sizeof(values[0]),
         values + thread_first_value_idx);

// close file on all threads
FileClose(file);</code></pre></div><p>It&#8217;s <em>much simpler</em>, now&#8212;compared to, say, the original parallel <code>for</code> case&#8212;to simply take another part of the problem like this, and to also distribute it amongst threads, simply because <em>wide</em> is the default shape of the program.</p><p>Instead of spending most programming time acting like we&#8217;re on a single-core machine, we simply assume our actual circumstances, which is that we have several cores, and <em>sometimes</em> we need to tie it all together with a few serial dependencies.</p><div><hr></div><h2>Non-Uniform Work Distributions</h2><p>Let&#8217;s take a look at our earlier calculations to distribute portions of the <code>values</code> array:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;a54c8f6a-7a8d-4eb8-8332-84e4aa126350&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 values_per_thread = values_count / thread_count;
S64 leftover_values_count = values_count % thread_count;
B32 thread_has_leftover = (thread_idx &lt; leftover_values_count);
S64 leftovers_before_this_thread_idx = (thread_has_leftover
                                        ? thread_idx
                                        : leftover_values_count);
S64 thread_first_value_idx = (values_per_thread * thread_idx +
                              leftovers_before_this_thread_idx);
S64 thread_opl_value_idx = (thread_first_value_idx + values_per_thread + 
                            !!thread_has_leftover);</code></pre></div><p>This was an easy case, because uniformly dividing portions of <code>values</code> produces nearly uniform <em>work</em> across all cores.</p><p>If, in a different scenario, we <em>don&#8217;t</em> produce nearly uniform work across all cores, we have a problem: some cores will finish their work in some section long before others, and they&#8217;ll be stuck at the next barrier synchronization point while the other cores finish. This diminishes the returns we obtain from going wide in the first place.</p><p>Thus, it&#8217;s always important to uniformly distribute <em>work</em> whenever it&#8217;s possible. The exact strategy for doing so will vary by problem. But I&#8217;ve noticed three common strategies:</p><ol><li><p>Uniformly distributing inputs produces uniformly distributed work (the case with the sum). So, we can decide the work distribution upfront.</p></li><li><p>Each portion of an input requires a variable amount of per-core work. The work is relatively bounded, and there are many portions of input (larger than the core count). So, we can dynamically grab work on each core, so cores which complete smaller work first receive more, whereas cores that are stuck on longer work leave more units of work for other cores.</p></li><li><p>Each portion of an input requires a variable amount of per-core work, but there is a small number (lower than the core count) of potentially very long sequences of work. We can attempt to redesign this algorithm such that it can be distributed more uniformly instead.</p></li></ol><p>We&#8217;ve already covered the first strategy with the sum example&#8212;let&#8217;s look at the latter two.</p><h3>Dynamically Assigning Many Variable-Work Tasks</h3><p>Let&#8217;s consider a case where we have many units of work&#8212;&#8220;tasks&#8221;&#8212;and we&#8217;d like to distribute these tasks across cores. We may start by distributing the tasks in the same way that we distributed values to sum in the earlier example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;8122a1a7-cf84-4efe-b28e-553f5f2cfcd9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">Task *tasks = ...;
S64 tasks_count = ...;
S64 thread_first_task_idx = ...;
S64 thread_opl_task_idx = ...;
for(S64 task_idx = thread_first_task_idx;
    task_idx &lt; thread_last_task_idx;
    task_idx += 1)
{
  // do task
}</code></pre></div><p>If each task requires a variable amount of work, then a profile of the program might look something like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9BhX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9BhX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 424w, https://substackcdn.com/image/fetch/$s_!9BhX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 848w, https://substackcdn.com/image/fetch/$s_!9BhX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 1272w, https://substackcdn.com/image/fetch/$s_!9BhX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9BhX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png" width="565" height="578.5817307692307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1491,&quot;width&quot;:1456,&quot;resizeWidth&quot;:565,&quot;bytes&quot;:726260,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9BhX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 424w, https://substackcdn.com/image/fetch/$s_!9BhX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 848w, https://substackcdn.com/image/fetch/$s_!9BhX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 1272w, https://substackcdn.com/image/fetch/$s_!9BhX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b7ba9-03fa-47d0-a520-1052a8739002_1767x1809.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Instead of deciding the task division upfront, we can dynamically assign tasks, such that the threads which are occupied (performing larger tasks) are not assigned more tasks until they&#8217;re done, and threads which complete shorter tasks earlier are quickly assigned more tasks, if available.</p><p>We can do that simply with a shared atomic counter, which each thread increments:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;83cace85-273c-4753-9cdb-32ef8e3294d0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">Task *tasks = ...;
S64 tasks_count = ...;

// set up the counter
static S64 task_take_counter = 0;
task_take_counter = 0;
BarrierSync(barrier);

// loop on all threads - take tasks as long as we can
for(;;)
{
  S64 task_idx = AtomicIncEval64(&amp;task_take_counter) - 1;
  if(task_idx &gt;= tasks_count)
  {
    break;
  }
  // do task
}</code></pre></div><p>This will dynamically distribute tasks across the cores, so that a profile of the program will look more like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J770!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J770!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 424w, https://substackcdn.com/image/fetch/$s_!J770!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 848w, https://substackcdn.com/image/fetch/$s_!J770!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 1272w, https://substackcdn.com/image/fetch/$s_!J770!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J770!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png" width="544" height="590.3296703296703" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1580,&quot;width&quot;:1456,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:679123,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J770!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 424w, https://substackcdn.com/image/fetch/$s_!J770!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 848w, https://substackcdn.com/image/fetch/$s_!J770!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 1272w, https://substackcdn.com/image/fetch/$s_!J770!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F110d5755-36bf-42a6-bd91-bab886abce0b_1554x1686.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Redesigning Algorithms For Uniform Work Distribution</h3><p>Dynamically assigning tasks to cores will help in many cases, but it gets less effective if tasks are highly variable, to the point of sometimes being exceedingly long (e.g. many times more expensive than smaller tasks), or if there are fewer tasks than the number of cores.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1raa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1raa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 424w, https://substackcdn.com/image/fetch/$s_!1raa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 848w, https://substackcdn.com/image/fetch/$s_!1raa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 1272w, https://substackcdn.com/image/fetch/$s_!1raa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1raa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png" width="517" height="512.0288461538462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1442,&quot;width&quot;:1456,&quot;resizeWidth&quot;:517,&quot;bytes&quot;:463349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1raa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 424w, https://substackcdn.com/image/fetch/$s_!1raa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 848w, https://substackcdn.com/image/fetch/$s_!1raa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 1272w, https://substackcdn.com/image/fetch/$s_!1raa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F045126c2-acfe-4a60-b75d-fec1171e05f5_1524x1509.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In these cases, it can often be helpful to reconsider the serial independencies <em>within</em> a single task, or whether the same <em>effect</em> as a highly serially-dependent algorithm can be provided by an alternative highly serially-<em>independent</em> algorithm. Can a single task be subdivided further? Can it be performed in a different way? Can serially-dependent work be untangled from heavier work which can be done in a serially-independent way?</p><p>The answers to such questions are highly problem-specific, so it&#8217;s impossible to offer substantially more useful advice while staying similarly generic. But to illustrate that it&#8217;s sometimes possible&#8212;even when counterintuitive&#8212;I have an example problem from my recent work, in which finding more uniform work distribution required switching from a single-threaded <a href="https://en.wikipedia.org/wiki/Comparison_sort">comparison sort</a> to a highly parallelizable <a href="https://en.wikipedia.org/wiki/Radix_sort">radix sort</a>.</p><p>In this problem, I had a small <em>number</em> of arrays that needed to be sorted, but these arrays were potentially very <em>large</em>, thus requiring a fairly expensive sorting pass.</p><p>My first approach was to simply distribute the comparison sort tasks themselves, so I would sort one array on a single core, while other cores would be sorting other arrays. But as I&#8217;ve said, there were a relatively small number of arrays, and the arrays were large, so sorting was fairly expensive&#8212;thus, <em>most</em> cores were doing nothing, and simply waiting for the small number of cores performing sorts to finish.</p><p>This approach would&#8217;ve worked fine if I had a larger number of smaller tasks. In fact, another part of the same program <em>does </em>distribute single-threaded comparison sort tasks in this way, because in that part of the problem, there <em>are</em> a larger number of smaller tasks.</p><p>In this case, I needed to sort array elements based on 64-bit integer keys. After sorting, the elements needed to be ordered such that their associated keys were ascending in value.</p><p>Conveniently, this can be done with a radix sort. I won&#8217;t cover the full details of the algorithm here (although I briefly covered it during a stream recently, which I recorded and uploaded <a href="https://www.rfleury.com/p/multithreaded-radix-sort-implementation">here</a>), but the important detail is that a radix sort requires a fixed number of <em>O(N)</em> passes over the array, and huge portions of work in each pass can be distributed uniformly across cores (in the same way that we distributed the sum work earlier).</p><p>Now, <em>all </em>cores participate in <em>every </em>larger sorting task, but they only perform a nearly uniform fraction of the work in each sort. This results in a much more uniform work distribution, and thus a much shorter total time spend sorting:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L74N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L74N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 424w, https://substackcdn.com/image/fetch/$s_!L74N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 848w, https://substackcdn.com/image/fetch/$s_!L74N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 1272w, https://substackcdn.com/image/fetch/$s_!L74N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L74N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png" width="545" height="581.6826923076923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:545,&quot;bytes&quot;:407108,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L74N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 424w, https://substackcdn.com/image/fetch/$s_!L74N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 848w, https://substackcdn.com/image/fetch/$s_!L74N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 1272w, https://substackcdn.com/image/fetch/$s_!L74N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d792c3-ceed-4aab-b63d-65fc065569f8_1557x1662.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is just one concrete example a larger pattern I&#8217;ve noticed: In many problems, upon close examination, some serial dependencies can either vanish, or they can be untangled from heavier work.</p><p>In some problems, serially-dependent parts of the algorithm can be isolated, such that they prepare data which allows the rest of the algorithm to be done in a serially-independent fashion. Imagine a program which walks a linked list early, on a single core, to compute a layout in a serially-dependent way. This layout can then allow subsequent work to execute <em>just</em> using the full layout, rather than forcing that subsequent work to also include the serially-dependent pointer chasing.</p><div><hr></div><h2>Single-Threaded, Just Better</h2><p>Code which is multi-core by default feels like normal single-threaded code, just with a few extra constructs that express the missing information needed to execute on multiple cores. This style has some useful and interesting properties, which make it preferable in many contexts to many of the popular styles of multi-core code found in the wild.</p><h3>Single-Core as a Parameterization</h3><p>One interesting implication of code written in this way&#8212;to be multi-core by default&#8212;is that it offers a strict <em>superset</em> of functionality than code which is written to be single-core, because &#8220;multi-core&#8221; in this case includes &#8220;single-core&#8221;, as one possible case. We can use the <em>same code</em> to execute on only a single core, simply by instead executing our entry point on a single thread, and parameterizing that thread with <code>thread_idx = 0</code> and <code>thread_count = 1</code>.</p><p>In that case, one core necessarily receives all of the work. <code>BarrierSync</code>s turn into no-ops, since there is only one thread (there are no other threads to wait for). Thus, it is equivalent to single-core functionality.</p><h3>Simpler Debugging</h3><p>This style of multi-core programming requires far less busywork and machinery in order to use multiple cores for some codepath. But one of the problems I mentioned with job systems and parallel <code>for</code>s earlier was not only that they require more busywork and machinery, but that they&#8217;re also more difficult to debug.</p><p>In this case, debugging is much simpler&#8212;in fact, it doesn&#8217;t look all that different from single-core debugging. At every point, you have access to a full call stack, and all contextual data which led to whatever point in time that you happen to be inspecting in a debugger.</p><p>Furthermore, because all threads involved are nearly homogeneous (rather than the generic job system, where all threads are heterogeneous at all times), debugging a <em>single thread</em> is a lot like debugging <em>all threads</em>. This is especially true because&#8212;between barrier synchronization points&#8212;the threads are all executing the same code. In other words, the context and state on one thread is likely to be highly informative of the context and state on <em>all</em> threads.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Q9d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Q9d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 424w, https://substackcdn.com/image/fetch/$s_!7Q9d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 848w, https://substackcdn.com/image/fetch/$s_!7Q9d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 1272w, https://substackcdn.com/image/fetch/$s_!7Q9d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Q9d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png" width="679" height="423" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:423,&quot;width&quot;:679,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98317,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Q9d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 424w, https://substackcdn.com/image/fetch/$s_!7Q9d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 848w, https://substackcdn.com/image/fetch/$s_!7Q9d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 1272w, https://substackcdn.com/image/fetch/$s_!7Q9d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06911f6-c9b7-46d8-befd-425bd831f74c_679x423.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Access To The Full Stack</h3><p>Because the context for some through line of computation frequently changes in traditional job systems, extra machinery must be involved to pipe data from one context to another&#8212;across jobs and threads&#8212;and maintain any associated allocations and lifetimes. But in this style, resources and lifetimes are kept as simple as they are in single-threaded code.</p><p>The stack, containing all contextual state at any point, becomes a single bucket for useful thread-local storage. In a job system, the stack is useful multi-core thread-local storage, but <em>only</em> for the duration of the job. The job is equivalent to the inner body of a <code>for</code>&#8212;this is a tiny, fragmentary scope. With this style, the entire stack is available, at any point.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m_bz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m_bz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 424w, https://substackcdn.com/image/fetch/$s_!m_bz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 848w, https://substackcdn.com/image/fetch/$s_!m_bz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 1272w, https://substackcdn.com/image/fetch/$s_!m_bz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m_bz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png" width="545" height="318.54052197802196" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1456,&quot;resizeWidth&quot;:545,&quot;bytes&quot;:328198,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/172146732?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m_bz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 424w, https://substackcdn.com/image/fetch/$s_!m_bz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 848w, https://substackcdn.com/image/fetch/$s_!m_bz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 1272w, https://substackcdn.com/image/fetch/$s_!m_bz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7910c19-974e-4a3a-a6b7-fbca4453e8a8_1893x1107.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Codebase Support</h2><p>I&#8217;ve found some useful patterns which can be extracted and widely used in code which is multi-core by default. These patterns seem as widely applicable as <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">arenas</a>&#8212;as such, they can be a useful addition to a <a href="https://github.com/EpicGamesExt/raddebugger/blob/master/src/base/base_thread_context.h">codebase&#8217;s base layer</a>.</p><h3>Thread-Local Group Data</h3><h4><code>LaneIdx()</code>, <code>LaneCount()</code>, <code>LaneSync()</code></h4><p>The earlier example code frequently uses the <code>thread_idx</code>, <code>thread_count</code>, and <code>barrier</code> variables. Passing these to every codepath which might need them is redundant and cumbersome. As such, they are good candidates for thread-local storage.</p><p>In my code, I&#8217;ve bundled these into the base layer&#8217;s &#8220;thread context&#8221;, which is a thread-local structure which is universally accessible&#8212;it&#8217;s where, for example, <a href="https://www.rfleury.com/i/70173682/per-thread-scratch-arenas">thread-local scratch arenas</a> are stored.</p><p>This provides all code the ability to read its index within a thread group (<code>thread_idx</code>), or the number of threads in its group (<code>thread_count</code>), and to synchronize with other lanes (<code>BarrierSync</code>).</p><p>As I suggested earlier, any code&#8217;s <em>caller</em> can choose &#8220;how wide&#8221;&#8212;how many cores&#8212;they&#8217;d like to execute that code, by configuring this per-thread storage. In general, <em>shallow</em> parts of a call stack can decide how wide <em>deeper</em> parts of a call stack are executed. If some work is expected to be small (to the point where it doesn&#8217;t benefit from being executed on many cores), and other cores can be doing other useful work, then before doing that work, the calling code can simply set <code>thread_idx = 0</code>, <code>thread_count = 1</code>, and <code>barrier = {0}</code>.</p><p>This means that a single thread may participate in many <em>different</em> thread groups&#8212;in other words, <code>thread_idx</code> and <code>thread_count</code> are not static within the execution of a single thread. Therefore, I found it appropriate to introduce another disambiguating term: <em>lane</em>. A <em>lane</em> is distinct from a thread in that a lane is simply one thread <em>within</em> a potentially-temporary group of threads, all executing the same code.</p><p>As such, in my terminology, <code>thread_idx</code> is exposed as <code>LaneIdx()</code>, and <code>thread_count</code> is exposed as <code>LaneCount()</code>. To synchronize with other lanes, a helper <code>LaneSync()</code> is available, which just waits on the thread context&#8217;s currently selected barrier.</p><h3>Uniformly Distributing Ranges Amongst Lanes</h3><h4><code>LaneRange(count)</code></h4><p>I&#8217;ve mentioned the following computation multiple times:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;5afdfee9-e69b-4413-ac33-be3b8a564bd9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">S64 values_per_thread = values_count / thread_count;
S64 leftover_values_count = values_count % thread_count;
B32 thread_has_leftover = (thread_idx &lt; leftover_values_count);
S64 leftovers_before_this_thread_idx = (thread_has_leftover
                                        ? thread_idx
                                        : leftover_values_count);
S64 thread_first_value_idx = (values_per_thread * thread_idx +
                              leftovers_before_this_thread_idx);
S64 thread_opl_value_idx = (thread_first_value_idx + values_per_thread + 
                            !!thread_has_leftover);</code></pre></div><p>This is useful whenever a uniformly distributed range corresponds to uniformly distributed work amongst cores. As I mentioned, this is sometimes not desirable. But nevertheless, it&#8217;s an extremely common case. As such, I found it useful to expose this as <code>LaneRange(count)</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;56187545-47fa-4ea5-ae23-2216965349a1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">Rng1U64 range = LaneRange(count);
for(U64 idx = range.min; idx &lt; range.max; idx += 1)
{
  // ...
}</code></pre></div><h3>Broadcasting Data Across Lanes</h3><h4><code>LaneSyncU64(value_ptr, source_lane_idx)</code></h4><p>Earlier, we saw that when a variable needs to be shared across lanes, it can simply be marked as <code>static</code>. I mentioned that this has the unfortunate downside that only a single group can be executing the code at one time, since one group of lanes could trample over the <code>static</code> variable while another group is still using it. As I mentioned, this is sometimes not a concern (since it&#8217;s desirable to only have a single lane group executing some code), but it invisibly makes code inapplicable for some cases.</p><p>For example, let&#8217;s suppose I have some code which is written to be multi-core by default. Depending on the inputs to this codepath, I may want this to be executed&#8212;on the same inputs&#8212;with all of my cores. But in other cases, I may want this to be executed with only a single core&#8212;I may still want to execute this codepath on other cores, but for different inputs. That requires many lane groups to be executing the code at the same time, thus disqualifying the use of <code>static</code> to share data amongst lanes within the same group.</p><p>To address this, I also created a simple mechanism to broadcast small amounts of data across lanes.</p><p>Each thread context also stores&#8212;in addition to a lane index, lane count, and lane group barrier&#8212;a pointer to a shared buffer, which is the same value for all lanes in the same group.</p><p>If one lane has a value which it needs to be broadcasted to other lanes&#8212;for instance, if it allocated a buffer that the other lanes are about to fill&#8212;then that value can be communicated in the following way:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;d98ccca3-e984-44de-958b-9e6f7a032cb0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">U64 broadcast_size = ...;         // the number of bytes to broadcast
U64 broadcast_src_lane_idx = ...; // the index of the broadcasting lane
void *lane_local_storage = ...;   // unique for each lane
void *lane_shared_storage = ...;  // same for all lanes

// copy from broadcaster -&gt; shared
if(LaneIdx() == broadcast_src_lane_idx)
{
  MemoryCopy(lane_shared_storage, lane_local_storage, broadcast_size);
}
LaneSync();

// copy from shared -&gt; broadcastees
if(LaneIdx() != broadcast_src_lane_idx)
{
  MemoryCopy(lane_local_storage, lane_shared_storage, broadcast_size);
}
LaneSync();</code></pre></div><p>I&#8217;ve found that this shared buffer just needs to be big enough to broadcast 8 bytes, given that most small data can be broadcasted with a small number of 8 byte broadcasts, and larger data can be broadcasted with a single pointer broadcast.</p><p>I expose this mechanism with the following API:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;c9754240-af8f-4a88-9dcb-26cf64809061&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">U64 some_value = 0;
U64 src_lane_idx = 0;
LaneSyncU64(&amp;some_value, src_lane_idx);
// after this line, all lanes share the same value for `some_value`</code></pre></div><p>It might be used in the following way:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;474e8fe2-63fc-471e-8a51-10ce66ff28ef&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// set `values_count`, allocate for `values`, on lane 0, then
// broadcast their values to all other lanes:
S64 values_count = 0;
S64 *values = 0;
if(LaneIdx() == 0)
{
  values_count = ...;
  values = Allocate(sizeof(values[0]) * values_count);
}
LaneSyncU64(&amp;values_count, 0);
LaneSyncU64(&amp;values, 0);</code></pre></div><h3>Revisiting The Summation Example</h3><p>With the above mechanisms, we can program the original summation example with the following steps.</p><p>First, we load the values from the file:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;856fb5f2-2a95-4dab-8768-a5d9dca98d63&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">U64 values_count = 0;
S64 *values = 0;
{
  File file = FileOpen(input_path);
  values_count = SizeFromFile(file) / sizeof(values[0]);
  if(LaneIdx() == 0)
  {
    values = (S64 *)Allocate(values_count * sizeof(values[0]));
  }
  LaneSyncU64(&amp;values);
  Rng1U64 value_range = LaneRange(values_count);
  Rng1U64 byte_range = R1U64(value_range.min * sizeof(values[0]),
                             value_range.max * sizeof(values[0]));
  FileRead(file, byte_range, values + value_range.min);
  FileClose(file);
}
LaneSync();</code></pre></div><p>Then, we perform the sum across all lanes:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;77657491-ac41-427c-80dd-ec8da6c9c134&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// grab the shared counter
S64 sum = 0;
S64 *sum_ptr = &amp;sum;
LaneSyncU64(&amp;sum_ptr, 0);

// calculate lane's sum
S64 lane_sum = 0;
Rng1U64 range = LaneRange(values_count);
for(U64 idx = range.min; idx &lt; range.max; idx += 1)
{
  lane_sum += values[idx];
}

// contribute this lane's sum to the total sum
AtomicAddEval64(sum_ptr, lane_sum);
LaneSync();
LaneSyncU64(&amp;sum, 0);</code></pre></div><p>And finally, we output the sum value:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;9a5fa6fb-44fc-4056-a348-a1686fc904c1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">if(LaneIdx() == 0)
{
  printf(&#8221;Sum: %I64d\n&#8221;);
}</code></pre></div><div><hr></div><h2>Closing Thoughts</h2><p>The concepts I&#8217;ve shared in this post represent what I feel is a fundamental shift in how CPU code can be expressed, compared to the normal single-core code all programmers are familiar with. Through small, additional annotations to code&#8212;basic concepts like <code>LaneIdx()</code>, <code>LaneCount()</code>, and <code>LaneSync()</code>&#8212;all code can contain the information necessary to be executed wide, using multiple cores to better take advantage of serial independence.</p><p>The same exact code can <em>also</em> be executed on a single core, meaning through these extra annotations, that code becomes strictly more flexible&#8212;at the low level&#8212;than its single-core equivalent which does <em>not</em> have these annotations.</p><p>Note that this is still not a comprehensive family of multithreading techniques, because it is strictly zooming in on one unique <em>timeline</em> of work, and how a single timeline can be accelerated using the fundamental multi-core reality of modern machines. But consider that programs often require <em>multiple</em> heterogeneous timelines of work, where one lane group is not in lockstep with others, and thus <em>should not</em> prohibit others from making progress.</p><p>But what I appreciate about the ideas in this post is that they do not <em>unnecessarily</em> introduce extra timelines. Communication between two heterogeneous timelines has intrinsic, relativity-related complexity. Those will always be necessary. But why pay that complexity cost <em>everywhere</em>, to accomplish simple multi-core execution?</p><p>I&#8217;m aware that, for many, these ideas are old news&#8212;indeed, everyone learns different things at different times. But in my own past programming, and when I look at the programming of many others, it seems that there is an awful lot of overengineering to do what seems trivial, and indeed what <em>is </em>trivial in other domains (like shader programming). So, for at least many people, these concepts do not <em>seem</em> well-known or old (even if they are in some circles and domains).</p><p>In any case, the concepts I&#8217;ve shared in this post have been dramatically helpful in improving my ability to structure multi-core code without overcomplication, and it seemed like an important-enough shift to carefully document it here.</p><p>I hope it was similarly helpful to you, if you didn&#8217;t know the concepts, or if you did, I hope it was nonetheless interesting.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Multithreaded Radix Sort Implementation Walkthrough]]></title><description><![CDATA[A stream clip where I was asked about a multithreaded radix sort.]]></description><link>https://www.dgtlgrove.com/p/multithreaded-radix-sort-implementation</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/multithreaded-radix-sort-implementation</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Thu, 04 Sep 2025 05:18:10 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/172746936/854edeae1458f650f7f1b9b0d2a84d1c.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>When <a href="https://twitch.tv/ryanfleury">streaming programming</a> recently, I was asked about a multithreaded <a href="https://en.wikipedia.org/wiki/Radix_sort">radix sort</a> that I had implemented for <a href="https://github.com/EpicGamesExt/raddebugger/blob/d3b394fe1884c580660ae78f12c5379a88313a35/src/rdi_make/rdi_make_local_2.c#L234">a part of the RAD Debugger</a> with the help of <a href="https://github.com/mmozeiko">M&#257;rti&#326;&#353; Mo&#382;eiko</a> and Nikita Smith (creator of the <a href="https://github.com/EpicGamesExt/raddebugger">RAD Linker</a>). In answering the question, I covered the basics of a radix sort, how it can be organized such that big portions of the sort can be easily pa&#8230;</p>
      <p>
          <a href="https://www.dgtlgrove.com/p/multithreaded-radix-sort-implementation">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Cracking the Code: Realtime Debugger Visualization Architecture – BSC 2025]]></title><description><![CDATA[My talk at the 2025 Better Software Conference.]]></description><link>https://www.dgtlgrove.com/p/cracking-the-code-realtime-debugger</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/cracking-the-code-realtime-debugger</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Wed, 23 Jul 2025 04:56:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/_9_bK_WjuYY" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I had the honor of speaking alongside many brilliant programmers at the <a href="https://bettersoftwareconference.com">Better Software Conference</a> this year. The recording for my talk is now available&#8212;you can watch it here:</p><div id="youtube2-_9_bK_WjuYY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;_9_bK_WjuYY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/_9_bK_WjuYY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>My talk focused on some of my work on the <a href="https://github.com/EpicGamesExt/raddebugger">RAD Debugger</a> over the past few years&#8212;specifically, that on the architecture of the debugger&#8217;s evaluation and visualization systems.</p><p>In my view, visualization is near the top of the priority list for debuggers, but most debugger material focuses on the implementation of core building blocks of debugger-debuggee interaction, like process control, stepping, breakpoints, and so on. This is understandable, because debuggers are an enormously vast problem space. But, seeing this as one of a debugger&#8217;s topmost priorities, and as something unique about debuggers that I&#8217;d spent a lot of time on, I tried to cover as much as possible about debugger visualization engine architecture within a ~1.5 hour window.</p><p>As it turns out, there&#8217;s a <em>lot</em> of ground to cover, so in the talk I walk through the basics of process control, debug information, and the compilation and interpretation pipeline a debugger contains for evaluation, and I follow that by digging a bit more deeply into the pipeline for visualizing evaluations in a variety of ways, from building an &#8220;infinite&#8221; watch window tree, to graphical visualizers:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_XKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_XKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 424w, https://substackcdn.com/image/fetch/$s_!_XKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 848w, https://substackcdn.com/image/fetch/$s_!_XKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!_XKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_XKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png" width="1456" height="854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:201890,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.rfleury.com/i/169016813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_XKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 424w, https://substackcdn.com/image/fetch/$s_!_XKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 848w, https://substackcdn.com/image/fetch/$s_!_XKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!_XKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff960b2af-301c-406d-8ee1-1c3663baa07a_1842x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I hope everyone enjoys the talk, and finds it interesting or useful!</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Demystifying Debuggers, Part 3: Debugger-Kernel Interaction]]></title><description><![CDATA[On how kernels collect and expose information about program execution to debuggers.]]></description><link>https://www.dgtlgrove.com/p/demystifying-debuggers-part-3-kernel</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/demystifying-debuggers-part-3-kernel</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Sat, 28 Dec 2024 06:11:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3a3264b9-e7ca-4e55-bc43-c6d9f71a93e7_1660x1220.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em><a href="https://www.dgtlgrove.com/p/index#%C2%A7demystifying-debuggers-series">Part 3 in a series.</a></em></p><p>As I stated in <a href="https://www.rfleury.com/p/demystifying-debuggers-part-1-a-busy">part 1</a>, &#8220;debugger&#8221; wouldn&#8217;t have been my name choice for, well, debuggers, because it understates the full scope of their functionality. Debuggers are programs for interactive program runtime analysis. Now that I&#8217;ve <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-2-the">unpacked what a program actually is</a>, I can dig into what &#8220;analysis&#8221; means.</p><div><hr></div><h3>Debug Events</h3><p>Debuggers execute as processes which receive information about another executing <a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-2-the">process</a>, from an operating system&#8217;s kernel. In one way or another, a kernel will associate a debugger&#8217;s process with some other process. The debugger is said to be &#8220;attached&#8221; to this other process. This other process can be called the &#8220;debuggee&#8221; or &#8220;target&#8221; process.</p><p>When a debugger is attached to a process, it can receive information about notable events in that process&#8217; execution, like:</p><ul><li><p><em><strong>When a process is created</strong></em>, and details about that process.</p></li><li><p><em><strong>When a thread is created</strong></em>, and details about that thread.</p></li><li><p><em><strong>When a module is loaded</strong></em>, and details about that module.</p></li><li><p><em><strong>When a thread is named</strong></em>, which thread was named, and the contents of that name.</p></li><li><p><em><strong>When a thread encounters an exception</strong></em>, like a &#8220;trap&#8221;, or memory violation, which thread encountered this exception, and at which instruction address this exception occurred.</p></li><li><p><em><strong>When a thread logs a debug string</strong></em>, which thread logged it, and the contents of that string.</p></li><li><p><em><strong>When a process exits</strong></em><strong>.</strong></p></li><li><p><em><strong>When a thread exits</strong></em><strong>.</strong></p></li><li><p><em><strong>When a module is unloaded</strong></em><strong>.</strong></p></li></ul><p>These are called <em>debug events</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vz51!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vz51!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 424w, https://substackcdn.com/image/fetch/$s_!vz51!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 848w, https://substackcdn.com/image/fetch/$s_!vz51!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 1272w, https://substackcdn.com/image/fetch/$s_!vz51!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vz51!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png" width="580" height="290.2285263987392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35ae503c-a624-4948-86a0-857bb2068831_1269x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:1269,&quot;resizeWidth&quot;:580,&quot;bytes&quot;:212699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vz51!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 424w, https://substackcdn.com/image/fetch/$s_!vz51!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 848w, https://substackcdn.com/image/fetch/$s_!vz51!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 1272w, https://substackcdn.com/image/fetch/$s_!vz51!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ae503c-a624-4948-86a0-857bb2068831_1269x635.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The kernel is able to report these events to a debugger because these events are first reported to the kernel. This is either because: <strong>(a)</strong> these events are caused by a program&#8217;s direct interaction with the kernel, like its calling of <code>LoadLibrary</code> on Windows causing a module to be loaded, or <strong>(b)</strong> because the kernel configures the hardware to interrupt execution and report information to the kernel when certain events occur.</p><p>In the latter case, this is done through mechanisms like x86&#8217;s <em><a href="https://en.wikipedia.org/wiki/Interrupt_descriptor_table">interrupt descriptor table</a></em>, which encodes a table of code addresses&#8212;the beginning addresses of a number of &#8220;interrupt handlers&#8221;. The CPU&#8212;upon encountering specific error conditions (or &#8220;exceptions&#8221;&#8212;not to be confused with exceptions in high-level languages)&#8212;will execute code at one of these addresses. It selects an entry in the table using a numeric code, which represents whatever error condition was encountered.</p><p>This system is used to implement <em>virtual address spaces</em>, as I <a href="https://www.rfleury.com/p/demystifying-debuggers-part-2-the">previously described</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oxQE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oxQE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oxQE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png" width="430" height="430" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:430,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oxQE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When a virtual address fails to map to a physical address using a page table, a <em>page fault</em> is raised. On x86, the code for this fault happens to be <code>0x0E</code>&#8212;this code is used to select a specific interrupt handler from the interrupt descriptor table. The CPU will jump execution to the associated interrupt handler, which is supplied by the kernel. Thus, the fact that the code accessed a non-physically-mapped address is first reported to the kernel.</p><p>In this case, one possibility is that the kernel swapped the physical storage for some address to disk, but the virtual address is completely legal for the program to access.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n1kw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n1kw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 424w, https://substackcdn.com/image/fetch/$s_!n1kw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 848w, https://substackcdn.com/image/fetch/$s_!n1kw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 1272w, https://substackcdn.com/image/fetch/$s_!n1kw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n1kw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png" width="1456" height="1021" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1021,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:521761,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n1kw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 424w, https://substackcdn.com/image/fetch/$s_!n1kw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 848w, https://substackcdn.com/image/fetch/$s_!n1kw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 1272w, https://substackcdn.com/image/fetch/$s_!n1kw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9873361-48be-4bbf-892f-dca43c013d23_1802x1264.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Another possibility is that the virtual address was <em>not</em> legal for the program to access. This is a situation in which this information would be reported to a debugger, if one is attached to the thread&#8217;s owner process. If a debugger is <em>not</em> attached, then the kernel has a number of other options&#8212;for example, it might simply kill the process, or it might want to launch a debugger and attach it to the process (a feature known as &#8220;Just-In-Time Debugging&#8221;), so that the state of the process can be inspected.</p><p>There are a number of design decisions a kernel might make in how it chooses to report this information to a debugger. For example, if one thread encounters an exception, do the process&#8217; other threads continue executing? Or, if that thread is simply reporting a debug event which doesn&#8217;t necessarily halt execution&#8212;like a new module being loaded, or a debug string being logged&#8212;does that thread (and, still, the others in the process) continue executing?</p><p>The model used by Windows is to enable the debugger to <em>interleave</em> its logic with the debuggee process. Thus, when an event is reported to a debugger, all threads in the associated process will stop being scheduled, until the debugger has signaled to the kernel that the process should resume.</p><p>The model used by Linux is similar, in that the debugger (in Linux terminology, the &#8220;tracer&#8221;) can interleave its work with the debuggee (the &#8220;tracee&#8221;), although attachment happens at <em>thread-granularity</em>, rather than <em>process-granularity.</em></p><p>While not literally the case, it&#8217;s as if the debuggee <em>calls into</em> the debugger, in order to understand how to proceed. The kernel facilitates this by interrupting the debuggee, transferring execution to the debugger, and then when the debugger says the word, execution is transferred back to the debuggee.</p><p>There is a further layer of minutiae to this problem, in that the kernel can also decide whether or not to allow a single debugger to be attached to <em>multiple</em> processes, rather than a single process. Windows <em>does</em> allow this, for example. And thus, extra design decisions apply. If one thread in one process reports a debug events, we know <em>that process</em> halts, but do the threads within <em>other</em> debuggee processes also halt?</p><p>While working on the <a href="https://github.com/EpicGamesExt/raddebugger">RAD Debugger</a>, I deduced from experimentation that Windows <em>does not</em>, in fact, automatically suspend other debuggee processes, when one debuggee reports a debug event. This is idiosyncratic with the behavior at thread granularity, which suspends all threads in a process, if one reports a debug event. For the RAD Debugger, I decided to implement the process-granularity suspension myself, such that threads across processes behaved identically to threads within processes. This simplifies the debugger design (and thus the user interface and experience), as well, because there is only a <em>single</em> state which determines whether or not debuggees are executing, rather than a per-process state. Other designs, however, are possible.</p><div><hr></div><h3>Debug Event APIs</h3><p>Because the information provided to the debugger about the debuggee(s) is, in essence, a <em>sequence of events</em>, it&#8217;s natural to expose it through a blocking event loop API, as Windows does, through its <code>WaitForDebugEvent</code> and <code>ContinueDebugEvent</code> APIs:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;edb1caf7-0782-4f83-85f2-2e7fe9f88637&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">for(DEBUG_EVENT evt = {0};
    WaitForDebugEvent(&amp;evt, INFINITE);
    ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
{
  // process `evt`
}</code></pre></div><p><a href="https://man7.org/linux/man-pages/man2/ptrace.2.html">Linux&#8217;s primary debugger API</a>&#8212;<code>ptrace</code>&#8212; is more granular. It&#8217;s not built <em>directly</em> for this event-loop structure, and as I mentioned earlier, it is a thread-granularity API rather than a process-granularity API, but it can be used to implement the same concept. It&#8217;s also used for a number of other operations&#8212;like many other Linux APIs, it&#8217;s like a Swiss Army knife.</p><p>With the above Windows API usage, either the debugger&#8217;s event loop is executing, <em>or</em> the debuggee process is executing. They do not simultaneously execute.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!swra!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!swra!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 424w, https://substackcdn.com/image/fetch/$s_!swra!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 848w, https://substackcdn.com/image/fetch/$s_!swra!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 1272w, https://substackcdn.com/image/fetch/$s_!swra!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!swra!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png" width="524" height="434.3873626373626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1207,&quot;width&quot;:1456,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:723963,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!swra!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 424w, https://substackcdn.com/image/fetch/$s_!swra!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 848w, https://substackcdn.com/image/fetch/$s_!swra!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 1272w, https://substackcdn.com/image/fetch/$s_!swra!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc18a0b8d-dba8-4320-bb10-86048a1bdc49_1816x1506.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Note that, in a full debugger, there <em>are</em> codepaths which should execute simultaneously with the debuggee&#8212;for example, that which repeatedly builds and renders a user interface, and allows visualization of previously-received debug events, or any information which can be collected while the debuggee executes. In practice, these codepaths are moved to separate threads:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UwNG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UwNG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 424w, https://substackcdn.com/image/fetch/$s_!UwNG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 848w, https://substackcdn.com/image/fetch/$s_!UwNG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 1272w, https://substackcdn.com/image/fetch/$s_!UwNG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UwNG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png" width="434" height="417.9259259259259" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:988,&quot;width&quot;:1026,&quot;resizeWidth&quot;:434,&quot;bytes&quot;:373782,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UwNG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 424w, https://substackcdn.com/image/fetch/$s_!UwNG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 848w, https://substackcdn.com/image/fetch/$s_!UwNG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 1272w, https://substackcdn.com/image/fetch/$s_!UwNG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F833ccb68-6cfa-46ae-9d23-e4ef1d22c30d_1026x988.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But I&#8217;ll cover more about that in a later post.</p><div><hr></div><h3>Reading Non-Event Debuggee Information</h3><p>While debug events help inform the debugger of changes to a debuggee across time, they do not contain the full set of information that a debugger needs to read about a debuggee.</p><p>A debugger may also need to read memory from arbitrary addresses within a debuggee&#8217;s address space. Remember that virtual address spaces exist <em>per-process</em>&#8212;in other words, a debuggee&#8217;s address space is <em>not</em> a debugger&#8217;s address space, so it isn&#8217;t as simple as reading from an arbitrary pointer.</p><p>Given our understanding of virtual address spaces, it would easily be doable for an operating system to map <em>the same</em> physical storage into <em>different address spaces</em>&#8212;in this case, being the debugger&#8217;s and debuggee&#8217;s address spaces.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ja2t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ja2t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 424w, https://substackcdn.com/image/fetch/$s_!ja2t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 848w, https://substackcdn.com/image/fetch/$s_!ja2t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!ja2t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ja2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png" width="528" height="341.6043956043956" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:942,&quot;width&quot;:1456,&quot;resizeWidth&quot;:528,&quot;bytes&quot;:736129,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ja2t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 424w, https://substackcdn.com/image/fetch/$s_!ja2t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 848w, https://substackcdn.com/image/fetch/$s_!ja2t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!ja2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15fcbdb6-9024-495d-8d90-f8b606ecf1fb_1738x1124.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In practice, this is a much heavier weight operation than necessary, as it requires a virtual address space allocation, whereas debugger&#8217;s often simply want to read sub-page-size regions of memory, at one moment in time (rather than holding a persistent mapping to a region of memory in a child process across stretches of time).</p><p>Windows provides the means to do this with its <code>ReadProcessMemory</code> function:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;a44ed2da-2a3e-43c3-8f03-0aaf16842e8e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">BOOL ReadProcessMemory(
  [in]  HANDLE  hProcess,
  [in]  LPCVOID lpBaseAddress,
  [out] LPVOID  lpBuffer,
  [in]  SIZE_T  nSize,
  [out] SIZE_T  *lpNumberOfBytesRead
);</code></pre></div><p>We can notice that&#8212;confusingly&#8212;the <code>lpBaseAddress</code> parameter, which encodes the address within <code>hProcess</code> from which data should be copied into <code>lpBuffer</code>, is a <em>pointer</em>. But <em>pointers</em> only refer to addresses <em>within the virtual address space of the process in which they&#8217;re used</em>. So this pointer can never be dereferenced, actually&#8212;it is merely an address-sized package, for encoding a virtual address in <code>hProcess</code>. Super weird! But, aside from that, this is fairly straightforward.</p><p>Linux provides the same functionality through <code>pread</code>, although that API less-confusingly uses an integer offset.</p><p>A debugger may also need to read thread register values&#8212;on Windows, this is provided through, mainly, the <code>GetThreadContext</code> API, although the exact APIs and how they&#8217;re used vary by underlying architecture. On Linux, this is provided through the <code>ptrace</code> API also (remember how I said it&#8217;s like a Swiss Army knife?).</p><div><hr></div><h3>A Simple Debugger Event Loop</h3><p>To concretize this information, let&#8217;s build a simple Windows debugger program which launches a single process, attaches to it, and logs information about debug events that it sees.</p><p>We can do this with the following APIs:</p><ul><li><p><code>CreateProcessA</code> &#8212; to launch a process, and automatically attach to it.</p></li><li><p><code>WaitForDebugEvent</code> &#8212; to sleep our debugger loop, until a debug event occurs.</p></li><li><p><code>ContinueDebugEvent</code> &#8212; to continue the attached debuggee after we&#8217;re done processing a debug event&#8217;s data, and to signify whether or not the debugger has handled an exception.</p></li></ul><p>First, we can use <code>CreateProcessA</code> to launch a process&#8212;if we use the <code>DEBUG_PROCESS</code> flag in our argument for <code>dwCreationFlags</code>, Windows will automatically attach our process as the debugger for the process we create.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;60dc0e88-26e9-4f15-90af-6fa5e129dea1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// sample_debugger.c

#include &lt;windows.h&gt;

int main(int argument_count, char **arguments)
{
  // launch process, attach
  char *cmd_line = arguments[1];
  STARTUPINFOA startup_info = {sizeof(startup_info)};
  PROCESS_INFORMATION process_info = {0};
  CreateProcessA(0, cmd_line, 0, 0, 0, DEBUG_PROCESS, 0, 0, &amp;startup_info, 
                 &amp;process_info);
  // ...
  return 0;
}</code></pre></div><p>Then, using <code>WaitForDebugEvent</code> and <code>ContinueDebugEvent</code>, we can build our debug event loop:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;9c43bf83-13dd-4e10-be15-a68471d0125c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// loop for debug events
for(DEBUG_EVENT evt = {0};
    WaitForDebugEvent(&amp;evt, INFINITE);
    ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
{
  // ...
}</code></pre></div><p>Finally, if our debuggee process terminates, we&#8217;ll want to exit our loop early, before calling <code>ContinueDebugEvent</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;e3d52653-c5fc-4306-af74-dedd3a30501a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// loop for debug events
for(DEBUG_EVENT evt = {0};
    WaitForDebugEvent(&amp;evt, INFINITE);
    ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
{
  // ...
  if(evt.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
  {
    break;
  }
}</code></pre></div><p>Next, all we need to do is dig into the structure of <code>DEBUG_EVENT</code>, log some information for each <code>DEBUG_EVENT</code> our loop iterates over, and hook this up to an actual program.</p><p>Let&#8217;s begin by getting everything up and running. First, I&#8217;ll just stub out the logging code:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;5c2d28fb-1aef-43f7-834f-0e37643d668a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">for(DEBUG_EVENT evt = {0};
    WaitForDebugEvent(&amp;evt, INFINITE);
    ContinueDebugEvent(evt.dwProcessId, evt.dwThreadId, DBG_CONTINUE))
{
  printf("Received a DEBUG_EVENT\n");
  fflush(stdout);
  if(evt.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
  {
    break;
  }
}</code></pre></div><p>And I&#8217;ll hook it up to a trivial program:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;7621c1bf-01b2-43d1-b394-b9ac37045f72&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// sample_debuggee.c

#include &lt;windows.h&gt;

int main(int argument_count, char **arguments)
{
  OutputDebugStringA("Hello, Debugger!\n");
  return 0;
}</code></pre></div><p>Both of these programs can be built with MSVC, with the following commands:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;cf2b15d8-9362-4c9f-98e7-07e2ae4178ba&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cl /nologo /Zi sample_debugger.c
cl /nologo /Zi sample_debuggee.c</code></pre></div><p>I can then run <code>sample_debugger.exe</code>, parameterizing it with the sample debuggee program:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;63262663-b42e-415e-9f9f-bd556992b9d4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">sample_debugger.exe sample_debuggee.exe</code></pre></div><p>And the output is the following:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;89d5a670-ac7e-4853-9403-0bb026849482&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT
Received a DEBUG_EVENT</code></pre></div><p>Not very useful! But now we can fill in the details by reading through the <a href="https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-debug_event">documentation</a> for <code>DEBUG_EVENT</code>. I won&#8217;t exhaustively do so for the purposes of this post, but let&#8217;s fill in some basics. First, let&#8217;s log a string encoding which <code>dwDebugEventCode</code> each event contains:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;b56f99e6-26f6-4b75-bcf4-5f39f088fb79&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// kind code -&gt; name
char *evt_kind_name = "[unknown event kind]";
switch(evt.dwDebugEventCode)
{
  default:{}break;
  case CREATE_PROCESS_DEBUG_EVENT:{evt_kind_name = "CREATE_PROCESS";}break;
  case EXIT_PROCESS_DEBUG_EVENT:  {evt_kind_name = "EXIT_PROCESS";}break;
  case CREATE_THREAD_DEBUG_EVENT: {evt_kind_name = "CREATE_THREAD";}break;
  case EXIT_THREAD_DEBUG_EVENT:   {evt_kind_name = "EXIT_THREAD";}break;
  case LOAD_DLL_DEBUG_EVENT:      {evt_kind_name = "LOAD_DLL";}break;
  case UNLOAD_DLL_DEBUG_EVENT:    {evt_kind_name = "UNLOAD_DLL";}break;
  case EXCEPTION_DEBUG_EVENT:     {evt_kind_name = "EXCEPTION";}break;
  case OUTPUT_DEBUG_STRING_EVENT: {evt_kind_name = "OUTPUT_DEBUG_STRING";}break;
}

// log
printf("Received a %s DEBUG_EVENT\n", evt_kind_name);
fflush(stdout);        </code></pre></div><p>If we run this, we&#8217;ll see the following:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d917c7ed-ef87-4bd9-aa4b-671c15561568&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Received a CREATE_PROCESS DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a EXCEPTION DEBUG_EVENT
Received a OUTPUT_DEBUG_STRING DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a CREATE_THREAD DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a EXIT_THREAD DEBUG_EVENT
Received a EXIT_PROCESS DEBUG_EVENT</code></pre></div><p>Many of these are self-explanatory&#8212;the initial <code>CREATE_PROCESS</code>, the <code>EXIT_PROCESS</code>, and surely that <code>OUTPUT_DEBUG_STRING</code> corresponds with our sample debuggee&#8217;s call to <code>OutputDebugStringA</code>. But let&#8217;s clear up some perhaps confusing details (to the degree that we can). My knowledge is limited, since not all of this behavior is documented (or the documentation hasn&#8217;t been easy for me to find), so some of my answers will be unsatisfying, but I&#8217;ll do my best.</p><h4><em><strong>What is that </strong></em><code>EXCEPTION</code><em><strong> event?</strong></em></h4><p>This is a mystery to me, but I can conject. Through experimentation, it seems that&#8212;no matter what&#8212;Windows generates this exception after a process is launched, seemingly always after all initial modules (those required by the executable&#8217;s imports) are loaded. If I had to guess, this is to mark the end of the loader&#8217;s work&#8212;to signify to debuggers that no further modules will be loaded before the program&#8217;s actual code is executed. I&#8217;m not sure if this is the true explanation, but marking this exception as processed (by passing <code>DBG_CONTINUE</code> to <code>ContinueDebugEvent</code>) reliably works.</p><h4><em><strong>How is the only </strong></em><code>CREATE_THREAD</code><em><strong> event seen after our </strong></em><code>OUTPUT_DEBUG_STRING</code><em><strong> event, which was caused by the main thread?</strong></em></h4><p>This is due to some unfortunate idiosyncrasies and asymmetries in the Windows debug event API. Almost all threads generate <code>CREATE_THREAD</code> events, and almost all modules generate <code>LOAD_DLL</code> events&#8212;<em>except</em> the main thread and main module. These&#8212;the creation of the main thread and the loading of the main module&#8212;are <em>implied</em> by the <code>CREATE_PROCESS</code> event. So, before the <code>OUTPUT_DEBUG_STRING</code> event is encountered by the debugger event loop, the main thread <em>was actually created</em>.</p><p>This API choice unnecessarily bifurcates codepaths which must apply to <em>all modules</em>, or to <em>all threads</em>. This is why, in the RAD Debugger codebase, <a href="https://github.com/EpicGamesExt/raddebugger/blob/aa42d12d0fe58409d52cbc950cb5e44f3a668e29/src/demon/demon_core.h#L62">we designed our own event structure</a>, which we produce by <em>converting</em> the event information within <code>DEBUG_EVENT</code>s&#8212;with this small extra conversion cost, we can simulate extra module-load or thread-creation events, given a <code>CREATE_PROCESS</code> event, such that <em>all</em> threads and <em>all </em>modules are represented by their own events. Thus, all per-module and all per-thread code can be trivially unified.</p><h4><em><strong>If that </strong></em><code>CREATE_THREAD</code><em><strong> event has nothing to do with the main thread, why is a thread created, then exited, before the program exits?</strong></em></h4><p>This seems to have something to do with the C runtime implementation. If we adjust our program to control its own low level entry point, and to remove its dependency on the C runtime, then our debug event log becomes much simpler.</p><p>Adjusting our debuggee program to this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;21ecd87e-389b-47df-93b1-f850c87bb4ba&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">void WinMainCRTStartup(void){}</code></pre></div><p>Results in our debug event log being:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d1fee10a-2a9f-4d5c-b746-db4d263fbfe7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Received a CREATE_PROCESS DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT
Received a EXCEPTION DEBUG_EVENT
Received a EXIT_PROCESS DEBUG_EVENT</code></pre></div><p>Which cleans up three of the DLLs, and that strange thread spawned when our debuggee is about to exit. Thus, we can deduce that the thread&#8212;for one reason or another&#8212;is introduced to do something for the MSVC C runtime implementation.</p><h4><em>Which DLLs are being loaded?</em></h4><p>To do this, let&#8217;s continue inspecting the definition of <code>DEBUG_EVENT</code>. The <a href="https://learn.microsoft.com/en-us/windows/win32/debug/debugging-events">documentation</a> for <code>LOAD_DLL_DEBUG_EVENT</code> informs us that, if a <code>DEBUG_EVENT</code>&#8217;s <code>dwDebugEventCode</code> member matches <code>LOAD_DLL_DEBUG_EVENT</code>, then the union within <code>DEBUG_EVENT</code> is to be interpreted as a <code>LOAD_DLL_DEBUG_INFO</code> structure. That structure contains an <code>hFile</code> member, which is a <code>HANDLE</code> to the executable image file. We can use Windows&#8217; <code>GetFinalPathNameByHandleA</code> API to produce the executable image&#8217;s full path. If we log that, we&#8217;ll know which DLLs are being loaded.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;86056068-7cec-4c7c-8e15-c8720850e95d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// extract module path
char module_path_buffer[256] = {0};
char *module_path = 0;
if(evt.dwDebugEventCode == LOAD_DLL_DEBUG_EVENT)
{
  GetFinalPathNameByHandleA(evt.u.LoadDll.hFile, module_path_buffer, sizeof(module_path_buffer), 0);
  module_path = module_path_buffer;
}

// log
printf("Received a %s DEBUG_EVENT", evt_kind_name);
if(module_path) { printf(" (%s)", module_path); }
printf("\n");
fflush(stdout);</code></pre></div><p>This produces:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;812297ba-6780-4f1e-b28d-804591b63f50&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Received a CREATE_PROCESS DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT (\\?\C:\Windows\System32\ntdll.dll)
Received a LOAD_DLL DEBUG_EVENT (\\?\C:\Windows\System32\kernel32.dll)
Received a LOAD_DLL DEBUG_EVENT (\\?\C:\Windows\System32\KernelBase.dll)
Received a EXCEPTION DEBUG_EVENT
Received a EXIT_PROCESS DEBUG_EVENT</code></pre></div><p>Recall that, in <a href="https://www.rfleury.com/p/demystifying-debuggers-part-2-the">part 2</a>, I covered the sort of work that an operating system&#8217;s <a href="https://www.rfleury.com/i/153235564/loaders-and-modules">loader</a> might do, when loading modules:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rog_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rog_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!rog_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rog_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png" width="368" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:368,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rog_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!rog_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every <code>LOAD_DLL</code> event also encodes <em>at which address</em> a module is loaded. We can log that too:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;aa617043-012c-44e7-99c4-b9542da61cbc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// extract base address
U64 base_vaddr = 0;
if(evt.dwDebugEventCode == LOAD_DLL_DEBUG_EVENT)
{
  base_vaddr = (U64)evt.u.LoadDll.lpBaseOfDll;
}

// log
printf("Received a %s DEBUG_EVENT", evt_kind_name);
if(module_path) { printf(" (%s)", module_path); }
if(base_vaddr)  { printf(" (0x%I64x)", base_vaddr); }
printf("\n");
fflush(stdout);</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bf3d4b41-14eb-4bac-ac16-09967740d37f&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Received a CREATE_PROCESS DEBUG_EVENT
Received a LOAD_DLL DEBUG_EVENT (\\?\C:\Windows\System32\ntdll.dll) (0x7ffff2af0000)
Received a LOAD_DLL DEBUG_EVENT (\\?\C:\Windows\System32\kernel32.dll) (0x7ffff1970000)
Received a LOAD_DLL DEBUG_EVENT (\\?\C:\Windows\System32\KernelBase.dll) (0x7ffff0520000)
Received a EXCEPTION DEBUG_EVENT
Received a EXIT_PROCESS DEBUG_EVENT</code></pre></div><p>I recommend studying the <a href="https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-debug_event">documentation</a> of <code>DEBUG_EVENT</code>, to see what further information it contains, and how it might be used. For instance, with this basic structure and very few extensions, we can start answering questions like:</p><ul><li><p><em>My program crashed while using a pointer&#8212;was it reading from, or writing to that pointer? What was the pointer&#8217;s value?</em></p></li><li><p><em>A thread hit a trap instruction&#8212;at what address was that trap instruction?</em></p></li><li><p><em>What name has the program assigned to each of the created threads</em>?</p></li></ul><p>The next step in building useful debugger functionality lies in accumulating and using information from a history of received debug events. A debugger can treat debug events as <em>deltas</em> to apply to its own data structure, which mirrors a debuggee&#8217;s process structure. This way, a debugger can keep track of handles, names, or addresses for every thread and module.</p><p>Such a data structure comes in handy&#8212;for example, in implementing features like:</p><ul><li><p>Given an address referenced by an <code>EXCEPTION</code> debug event, and a history of <code>LOAD_DLL</code> events, determine which module that exception occurred within, and correlate it with its name.</p></li><li><p>Use <code>ReadProcessMemory</code> to parse information out of a loaded executable image&#8212;since we know its address in the debuggee&#8217;s address space&#8212;like the path to its debug information file.</p></li><li><p>Given a thread, read the value of its instruction pointer register, and determine from which module it&#8217;s executing code.</p></li></ul><div><hr></div><h3>Debugger-to-Debuggee Interaction</h3><p>Debug events are used to send information, <em>debuggee</em>-to-<em>debugger</em>. But I opened this post by stating that debuggers are for <em>interactive runtime analysis</em>. And true <em>interactivity</em> requires <em>both</em> debuggee-to-debugger, <em>and</em> debugger-to-debuggee information flow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HRTM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HRTM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 424w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 848w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1272w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HRTM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png" width="548" height="315.4010989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:838,&quot;width&quot;:1456,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:629922,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HRTM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 424w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 848w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1272w, https://substackcdn.com/image/fetch/$s_!HRTM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6cfa5cd-2b2b-4e09-b2c1-01311b25b868_1476x850.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kernels provide a few extra mechanisms for this. I stated earlier that they support <em>reading memory</em> from debuggees, and <em>reading thread registers</em> from debuggees. They also provide mechanisms for <em>writing memory </em>into debuggees (on Windows, <code>WriteProcessMemory</code>, on Linux, <code>pwrite</code>), and <em>writing thread registers</em> into debuggees (on Windows, <code>SetThreadContext</code>, on Linux, <code>ptrace</code>).</p><p>An additional mechanism available to debuggers is the selection of <em>which debuggee threads</em> will be scheduled. This is often exposed through a &#8220;thread freezing&#8221; feature in debuggers:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OrFN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OrFN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 424w, https://substackcdn.com/image/fetch/$s_!OrFN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 848w, https://substackcdn.com/image/fetch/$s_!OrFN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 1272w, https://substackcdn.com/image/fetch/$s_!OrFN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OrFN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png" width="1200" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75305,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OrFN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 424w, https://substackcdn.com/image/fetch/$s_!OrFN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 848w, https://substackcdn.com/image/fetch/$s_!OrFN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 1272w, https://substackcdn.com/image/fetch/$s_!OrFN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4274a266-916f-4bfe-94c8-bd16fad26fc5_1200x347.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the case of Linux, because <code>ptrace</code> is a thread-granularity API, this is a natural extension of <code>ptrace</code> functionality&#8212;the debugger can simply decide to <em>not</em> resume execution of specific threads. On Windows, this functionality is provided through <code>SuspendThread</code> and <code>ResumeThread</code>, which control a per-thread counter. When this counter is nonzero, a thread will not be scheduled. When it is zero, the thread is legal to be scheduled by the kernel.</p><p>The kernel <em>just </em>providing the ability to write memory, write registers, and suspend or resume threads leads to a massive explosion of potential features, of which debuggers can take advantage. But we&#8217;ll dig into exactly <em>how</em> a debugger can take advantage of these features in a later post.</p><div><hr></div><h3>On Current Operating System Debugger APIs</h3><p>The peculiarities&#8212;of which there are many&#8212;of the Windows or Linux debugger APIs are not <em>particularly</em> relevant, which is why I am not exhaustively exploring them in these posts. The more important part is the set of operations that they facilitate&#8212;the <em>effects</em> which they may be used to create.</p><p>I encourage readers to not be too concerned about these peculiarities. For the <a href="https://github.com/EpicGamesExt/raddebugger">RAD Debugger</a>, because we&#8217;ve built the debugger to be easily portable, and thus did not want to couple our debugger functionality tightly to specific peculiarities of&#8212;say&#8212;the Windows debugging API, we built our own debugging API abstraction, which is much simpler for us to use in implementing the debugger. This API is implemented as needed on each target platform.</p><p>Furthermore, I encourage readers to not be too intimidated by these APIs. There is no magic here&#8212;at some level, the following API <em>must be implementable </em>on each platform, in order for debuggers (as we know them) to work at all:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;daa005ae-bc6f-4cb2-832e-05493b0156d0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">DebugEvent GetNextDebugEvent(...)
U64 ReadProcessMemory(Handle process, U64 addr, void *out, U64 size)
U64 WriteProcessMemory(Handle process, U64 vaddr, void *in, U64 size)
U64 ReadThreadRegs(Handle thread, void *out)
U64 WriteThreadRegs(Handle thread, void *in)</code></pre></div><p>As a concrete example, here are the associated APIs in the RAD Debugger codebase:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;145638cd-c0d2-4ca1-a7d4-a28371f6291a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">DMN_EventList dmn_ctrl_run(Arena *arena, DMN_CtrlCtx *ctx, DMN_RunCtrls *ctrls)
U64 dmn_process_read(DMN_Handle process, Rng1U64 range, void *dst)
B32 dmn_process_write(DMN_Handle process, Rng1U64 range, void *src)
B32 dmn_thread_read_reg_block(DMN_Handle handle, void *reg_block)
B32 dmn_thread_write_reg_block(DMN_Handle handle, void *reg_block)</code></pre></div><p>The details can get a bit messy, but I encourage a spirit of resolve, because there are some amazing possibilities unlocked when the implementation is right!</p><div><hr></div><p>I&#8217;ve now covered all of the basic building blocks that kernels provide for debuggers. As surprising as it may seem, this is all of the machinery we need from the kernel for the vast majority of common debugger features. But we&#8217;ll dig more into how a debugger can take advantage of this machinery next time!</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Demystifying Debuggers, Part 2: The Anatomy Of A Running Program]]></title><description><![CDATA[On the concepts involved in a running program. What happens, exactly, when you double click an executable file, or launch it from the command line, and it begins to execute?]]></description><link>https://www.dgtlgrove.com/p/demystifying-debuggers-part-2-the</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/demystifying-debuggers-part-2-the</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Mon, 23 Dec 2024 03:39:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/808c2a1f-73f3-4f53-8263-6261d6de9959_3192x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em><a href="https://www.dgtlgrove.com/p/index#%C2%A7demystifying-debuggers-series">Part 2 in a series.</a></em></p><p>From day one using a modern home computer, users are exposed to the concept of a program. Support for separate programs is, after all, the main value-add of multitasking operating systems. But&#8212;if we take a peek under the hood&#8212;a program is a high-level term which refers to many lower level mechanisms and concepts, and it isn&#8217;t obvious from the outset how they&#8217;re all arranged.</p><p>To unpack debuggers&#8212;programs which analyze the execution of other programs&#8212;it&#8217;s important that we first unpack the concept of a program, so that we&#8217;re familiar with the details of programs that a debugger must contend with.</p><p>Programs are the virtualized equivalent of cartridges for an old video game console, like the Nintendo Entertainment System. The NES didn&#8217;t have a multitasking operating system, and it only executed a single program while it was turned on&#8212;whatever one was stored on the cartridge that the player installed.</p><p>In this context, the program executing on the system had full availability to all of the system&#8217;s resources. There was no code running of which the program couldn&#8217;t be aware.</p><p>Programs, in the context of a multitasking operating system, are a bundle of mechanisms used to approximately provide the same thing <em>virtually</em> as the NES provided to the program stored on the cartridge <em>physically</em>. Of course, multitasking operating systems also provide ways for these programs to communicate and interact (that is indeed the point), but at some level they must still exist independently, as different physical cartridges do.</p><p>Because programs, unlike cartridges, can be executing on the same chip <em>at the same time</em>, and thus contend for the same resources, there are many additional <em>software</em> concepts that operating systems use to virtualize independent program execution:</p><ul><li><p>A <em><strong>virtual address space</strong></em> &#8212; A range of <em>virtual addresses</em>, for which the platform provides a mapping to <em>physical addresses</em>. Programs are built to interact with <em>virtual addresses, </em>which are entirely independent from addresses in other virtual address spaces. Virtual address spaces can be much <em>larger</em> than, for example, physical RAM limitations.</p></li><li><p>A <em><strong>thread of execution</strong></em> &#8212; A bundle of state which is used to initialize the CPU to coherently execute a sequence of instructions. Threads of execution are <em>scheduled</em> by the platform, such that many threads can execute on a small, fixed number of <em>cores</em>.</p></li><li><p>An <em><strong>executable image</strong></em> &#8212; A sequence of bytes encoding data in a platform-defined format, to encode executable machine instructions, as well as relevant headers and metadata. An independent code package&#8217;s <em>non-live</em> representation&#8212;a blueprint for execution.</p></li><li><p>A <em><strong>loader</strong></em> &#8212; The part of an operating system responsible for parsing <em>executable images</em>&#8212;blueprints for execution&#8212;and instantiating them, so that the code encoded in the images may be actually executed.</p></li><li><p>A <em><strong>module</strong></em> &#8212; The loaded equivalent of an <em>executable image</em>. One <em>process</em> can load several <em>modules</em>, although a process is always initialized by the loading of <em>one specific module</em> (the initial executable image). Modules can be both dynamically <em>loaded</em> and <em>unloaded</em>.</p></li><li><p>A <em><strong>process</strong></em> &#8212; An instance of a live, running program. Instantiated by the platform&#8217;s <em>loader</em> using the initial <em>executable image</em> to determine how it&#8217;s initialized, and what code is initially loaded. The granularity at which operating systems assign <em>virtual address spaces</em>. The container of several <em>modules</em>, and <em>threads of execution</em>.</p></li></ul><p>Let&#8217;s unpack all of this.</p><div><hr></div><h3>Virtual Address Spaces</h3><blockquote><p><em>A range of virtual addresses, for which the platform provides a mapping to physical addresses. Programs are built to interact with virtual addresses, which are entirely independent from addresses in other virtual address spaces. Virtual address spaces can be much larger than, for example, physical RAM limitations.</em></p></blockquote><p>Whether it&#8217;s through the easy or hard way, all programmers learn about pointers. When I first learned about pointers, I understood them as being used to encode integers, with the integers being addresses, which address bytes within memory, in linear order. Address <code>0</code> comes before <code>1</code>, which comes before <code>2</code>, and so on.</p><p>In other words, I was under the impression that physical memory, and its relationship to addresses, was structured like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bEyr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bEyr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 424w, https://substackcdn.com/image/fetch/$s_!bEyr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 848w, https://substackcdn.com/image/fetch/$s_!bEyr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 1272w, https://substackcdn.com/image/fetch/$s_!bEyr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bEyr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png" width="568" height="461.9570815450644" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1137,&quot;width&quot;:1398,&quot;resizeWidth&quot;:568,&quot;bytes&quot;:302713,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bEyr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 424w, https://substackcdn.com/image/fetch/$s_!bEyr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 848w, https://substackcdn.com/image/fetch/$s_!bEyr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 1272w, https://substackcdn.com/image/fetch/$s_!bEyr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf0d7080-b386-4391-95f7-4b7f3a98aa46_1398x1137.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a fine mental model to begin with. But it isn&#8217;t accurate.</p><p>When many independent programs execute on a single machine, it isn&#8217;t difficult to imagine one of them getting an address wrong. In fact, sometimes, it feels like &#8220;getting addresses wrong&#8221; is the only thing anybody talks about these days. If all of these programs shared a single memory space, this could easily lead to one program stomping over data that another program is using. It could also lead to, for example, a <em>malicious program</em>&#8212;let&#8217;s call it <code>ryans_game.exe</code>&#8212;reading information from <code>chrome.exe</code>, browsing a page from <code>chase.com</code> with all of your sensitive information on it. This is purely hypothetical!</p><p><em>Virtual address spaces</em> are used to mediate between different programs accessing the same resource&#8212;physical memory. Addresses <em>can</em> be understood as integers, and as such, they <em>are</em> linearly ordered, and they <em>do</em> each refer to sequential bytes&#8212;but these bytes are sequential in <em>virtual address space</em>, not in <em>physical memory</em>.</p><p>Virtual address spaces are implemented with a mapping data structure known as a <em>page table</em>. Page tables can be used to translate a virtual address to a physical address. They can then be used directly by the CPU in order to do address translation. For instance, if a CPU core were to execute a <code>mov</code> (move) instruction, to load 8 bytes from address <code>0x1000</code> into a register, then before issuing a read from physical memory, the CPU would first treat <code>0x1000</code> as a <em>virtual address</em>, and translate it into a <em>physical address, </em>which might be completely different&#8212;like <code>0x111000</code>.</p><p>&#8220;Page tables&#8221; are called as such, because they map from virtual to physical addresses at <em>page-size granularity</em>. A system&#8217;s page-size varies&#8212;on an x64 Windows PC, it&#8217;ll be 4 kilobytes. On an iPhone, it&#8217;ll be 16 kilobytes. Operating systems also expose larger page sizes under some circumstances.</p><p>This means the relationship between physical memory and an address&#8212;as used by a program, as a <em>virtual address</em>&#8212;looks more like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iSXi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iSXi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 424w, https://substackcdn.com/image/fetch/$s_!iSXi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 848w, https://substackcdn.com/image/fetch/$s_!iSXi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 1272w, https://substackcdn.com/image/fetch/$s_!iSXi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iSXi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png" width="602" height="540.8076923076923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1308,&quot;width&quot;:1456,&quot;resizeWidth&quot;:602,&quot;bytes&quot;:509363,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iSXi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 424w, https://substackcdn.com/image/fetch/$s_!iSXi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 848w, https://substackcdn.com/image/fetch/$s_!iSXi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 1272w, https://substackcdn.com/image/fetch/$s_!iSXi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee85c0c-2624-4ffb-9810-633b75e933b6_1533x1377.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If a virtual address <em>cannot be mapped</em> to a physical address, then a &#8220;page fault&#8221; exception is issued by the CPU, and execution is interrupted. If this is done by a program&#8217;s code, then execution will be transferred to the operating system&#8217;s code, which can take measures to address the cause of the exception and resume, or do whatever else it deems appropriate.</p><p>This provides a great deal of flexibility to operating systems. An operating system can move memory allocated by one program to disk&#8212;what&#8217;s known as &#8220;paging out&#8221;, or &#8220;swapping out&#8221;&#8212;if it expects that memory to not be accessed in the near future. It can then use that physical memory for more frequently accessed addresses, in any of the active virtual address spaces. If a page fault occurs when code attempts to access memory which has been paged out, then the operating system can simply page that memory back in, and resume execution. Thus, even though hundreds&#8212;if not thousands&#8212;of programs can be executing at once, the operating system can make much more efficient use of physical memory, given its analysis of which addresses in which spaces are needed, and when. This is critical in building operating systems which can support the execution of many programs, where all programs are contending for the same physical hardware.</p><p>It also provides a great deal of flexibility to programs, as it can be used to implement virtual address spaces which are <em>much larger</em> than physical memory. Nowadays, nearly every consumer CPU&#8212;from phones, to game consoles, to PCs, to laptops&#8212;is a 64-bit processor. For PCs and laptops running on 64-bit CPUs, the CPU and operating system normally provide a 48-bit address space. On some server systems, it is larger, and on some mobile and console platforms, it is smaller.</p><p>Taking a 48-bit address space as an example&#8212;48 bits allow the representation of 2<sup>48</sup> different values (each bit multiplies the number of possible values by 2). Since each value refers to a different potential byte, that is enough address space to refer to 256 terabytes.</p><p>To understand this further, let&#8217;s dissect the &#8220;page table&#8221; data structure a bit more.</p><p>First, let&#8217;s assume a 48-bit address space, and a 4 kilobyte page size (the usual configuration on x64 Windows systems). As I said, page tables map from virtual to physical addresses at <em>page-size granularity</em>. Because of our 4 kilobyte page size, we can infer that the bottom 12 bits of any address are <em>identical</em> for both a virtual address and a physical address (2<sup>12</sup> = 4096 = 4 kilobytes).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Iey!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Iey!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 424w, https://substackcdn.com/image/fetch/$s_!3Iey!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 848w, https://substackcdn.com/image/fetch/$s_!3Iey!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 1272w, https://substackcdn.com/image/fetch/$s_!3Iey!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Iey!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png" width="1456" height="586" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:586,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293389,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Iey!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 424w, https://substackcdn.com/image/fetch/$s_!3Iey!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 848w, https://substackcdn.com/image/fetch/$s_!3Iey!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 1272w, https://substackcdn.com/image/fetch/$s_!3Iey!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d8c36d-3abc-4faf-9d4a-f525f7a8dc9d_1572x633.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This leaves 36 remaining bits, of each address, to map from virtual to physical addresses. These bits are used to index into several hierarchical levels, within the page table&#8212;it is actually a hierarchical data structure, despite its name, which sounds like it implies a flat table. To understand why, imagine, first, a na&#239;ve page table implementation, which simply stores a 64-bit physical address, for each value in this 36-bit space. This, unsurprisingly, would require an unrealistically large amount of storage. Instead, we can notice that the page table need only map virtual addresses <em>which have actually been allocated</em>. At the outset, <em>none</em> are allocated. When a virtual address space allocation is made, a hierarchical data structure allows the page table implementation to <em>only</em> allocate nodes in the hierarchy which are actually touched, by that one allocation.</p><p>Each node in the hierarchy can simply be a table of 64-bit addresses which point to children nodes (or, at the final level, it can store each page&#8217;s physical address). If each node is a 512-element table, and each element is a 64-bit address (8 bytes), then each node requires 4096 bytes, which is our page-size!</p><p>Because 2<sup>9</sup> = 512, we can slice our 36-bits into <em>4</em> table indices&#8212;each 9 bits&#8212;and use that to traverse the page table. The first 9 bits indexes into the first level, the next into the second, the next into the third, the next into the fourth&#8212;the fourth provides the base address of the containing page of our address, and then the bottom 12 bits can be used as an offset from that base.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oxQE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oxQE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oxQE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1042833,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oxQE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!oxQE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f29652e-7a7a-4a0c-a6f3-8c90134f7976_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For each virtual address space, the operating system manages this page table structure. Before the operating system prepares the CPU to execute code for one program, it can supply this table, such that the CPU can appropriately issue memory reads and writes to <em>physical </em>addresses for the appropriate virtual address space. The end result is that each program can, in effect, live in its own universe of virtual addresses, as if it had access to the entire system&#8217;s memory space, and if that memory space far exceeded the limitations of a system&#8217;s random access memory (RAM) capacity.</p><div><hr></div><h3>Threads Of Execution</h3><blockquote><p><em>A bundle of state which is used to initialize the CPU to cohesively execute a sequence of instructions. Threads of execution are scheduled by the platform, such that many threads can execute on a small, fixed number of cores.</em></p></blockquote><p>Beyond a page table, a CPU core requires other information to coherently execute code. For instance, it requires the &#8220;instruction pointer&#8221; (or &#8220;program counter&#8221;)&#8212;this is a register, which stores the virtual address of the next instruction which should execute, in a given instruction stream. After each instruction is executed, the value in this register is updated to reflect the base address of the next subsequent instruction. On x64, this is known as the <code>rip</code> register.</p><p>When using a debugger, you&#8217;ll often see golden arrows, pointing to lines of source code or disassembly. This directly visualizes the location of the instruction pointer.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fHuU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fHuU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 424w, https://substackcdn.com/image/fetch/$s_!fHuU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 848w, https://substackcdn.com/image/fetch/$s_!fHuU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 1272w, https://substackcdn.com/image/fetch/$s_!fHuU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fHuU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png" width="1419" height="248" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:248,&quot;width&quot;:1419,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42198,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fHuU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 424w, https://substackcdn.com/image/fetch/$s_!fHuU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 848w, https://substackcdn.com/image/fetch/$s_!fHuU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 1272w, https://substackcdn.com/image/fetch/$s_!fHuU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe21e4ab3-7d54-4384-9722-9b4552b406da_1419x248.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There are several other registers, used for a variety of purposes, including general purpose slots for computations. The state of <em>all</em> such registers is called a &#8220;register state&#8221;, or &#8220;register file&#8221;. One register state is paired exclusively with one instruction stream, from one program&#8212;a register state should only change if a single instruction stream performs work which causes it.</p><p>But a CPU has only a fixed number of cores, be it 1, 2, 4, 8, 12, 16, 32, and so on&#8212;yet operating systems support a much larger number of programs executing simultaneously. Or, at least, it <em>seems</em> like they execute simultaneously.</p><p>The operating system implements this illusion&#8212;of hundreds if not thousands (if not more&#8212;unfortunately&#8230;) programs running simultaneously on a small, fixed number of cores&#8212;by <em>scheduling</em> <em>work</em> from these programs. One CPU core will perform work for one program, for some period of time&#8212;it will be interrupted, and the operating system can make the decision to schedule work from another program, for example.</p><p>A <em>thread of execution</em> is the name given to the execution state for one instruction stream. Each contains one register state, which includes the instruction pointer, and thus a stream of instructions&#8212;among whatever other state each operating system deems appropriate.</p><p>In other words, operating systems do not just schedule <em>programs</em>&#8212;they schedule <em>threads</em>. When an operating system <em>schedules a thread</em>, it incurs a &#8220;context switch&#8221;&#8212;this is the process of storing the CPU core state for whatever thread <em>was executing </em>to memory, and initializing that core to execute work for the thread which <em>will execute</em>.</p><div><hr></div><h3>Executable Images</h3><blockquote><p><em>A sequence of bytes encoding data in a platform-defined format, to encode executable machine instructions, as well as relevant headers and metadata. An independent code package&#8217;s non-live representation&#8212;a blueprint for execution.</em></p></blockquote><p>On Windows, you&#8217;ll find executable images stored on the filesystem with a <code>.exe</code>, or a <code>.dll</code> extension. These files are stored in the <a href="https://learn.microsoft.com/en-us/windows/win32/debug/pe-format">Portable Executable (PE) format</a>. The difference between <code>.exe</code> and <code>.dll</code> is that the former is used to signify that an executable image is a viable <em>initial module</em> for a process, whereas the latter is used to signify that an executable image is only to be loaded dynamically as an <em>additional module</em> for a process.</p><p>On Linux systems, there is a similar structure&#8212;executable images are stored on the filesystem (the extension convention varies&#8212;sometimes there is no extension for the equivalent of Windows&#8217; <code>.exe</code>, sometimes there is a <code>.elf</code> extension, and for the equivalent of Windows&#8217; <code>.dll</code>, the extension is generally <code>.so</code>). These files are stored in the <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">Executable and Linkable Format (ELF)</a>.</p><p>When I say &#8220;these files are stored in&#8221; a particular format, what I mean is that the associated operating system&#8217;s <em>loader</em> expects files in that format. In order to produce code which can be loaded on a platform out-of-the-box, one must package that code in the format which is expected by that platform.</p><p>It&#8217;s not in this series&#8217; scope to comprehensively dissect either the PE or the ELF formats. But to justify the definition and concepts I&#8217;ve provided, let&#8217;s investigate the PE format using a simple example.</p><p>First, consider the following code:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;b5716d58-08c2-4765-8eac-169fdcc67e9f&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// sample.c

void WinMainCRTStartup(void)
{
  int x = 0;
}</code></pre></div><p>This can be built with the following command, using the Visual Studio Build Tools:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b3e09c48-66b6-4775-8e0d-ee04a2b7003b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cl /nologo /Zi sample.c /link /NODEFAULTLIB /INCREMENTAL:NO /SUBSYSTEM:WINDOWS</code></pre></div><p>This command will produce an executable image, containing machine code. This machine code could be disassembled (for instance, using a debugger)&#8212;that would show something like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;0a93a466-e83e-4753-9a52-326c2cc6a964&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">sub rsp, 0x18              ; - push 24 bytes onto the stack, for locals
mov dword ptr [rsp], 0x00  ; - set the 4 bytes we are using of the stack
                           ;     for `x` to 0
add rsp, 0x18              ; - pop the 24 bytes we pushed off the stack
ret                        ; - return to the caller of our main function</code></pre></div><p>Even if we know nothing else about the PE format, we <em>do know</em> that these instructions need to be encoded <em>somewhere</em> in the file. We can identify how these are encoded using a disassembler tool as well, which should have an ability to visualize the machine code bytes which were parsed to form each instruction:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1wKy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1wKy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 424w, https://substackcdn.com/image/fetch/$s_!1wKy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 848w, https://substackcdn.com/image/fetch/$s_!1wKy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 1272w, https://substackcdn.com/image/fetch/$s_!1wKy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1wKy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png" width="849" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48721,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1wKy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 424w, https://substackcdn.com/image/fetch/$s_!1wKy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 848w, https://substackcdn.com/image/fetch/$s_!1wKy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 1272w, https://substackcdn.com/image/fetch/$s_!1wKy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa185fe0b-85c6-4e6a-a279-abf50f96727d_849x512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The above image shows the disassembled instructions in the RAD Debugger, as well as the bytes from which they were parsed. If you took a look at the disassembly yourself, and were confused by the <code>add [rax], al</code> instruction everywhere surrounding the actual code, the code bytes also clear that mystery up&#8212;that is simply the instruction one obtains when parsing two sequential zero bytes.</p><p>Given the above, we know that the generated machine code is encoded with 16 bytes. Each byte can be represented with two hexadecimal digits:</p><pre><code>48 83 ec 18 c7 04 24 00 00 00 00 48 83 c4 18 c3</code></pre><p>If we look at the generated EXE with a memory viewer, we can, indeed, find this sequence of bytes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wJkk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wJkk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 424w, https://substackcdn.com/image/fetch/$s_!wJkk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 848w, https://substackcdn.com/image/fetch/$s_!wJkk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 1272w, https://substackcdn.com/image/fetch/$s_!wJkk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wJkk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png" width="870" height="610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:870,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37748,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wJkk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 424w, https://substackcdn.com/image/fetch/$s_!wJkk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 848w, https://substackcdn.com/image/fetch/$s_!wJkk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 1272w, https://substackcdn.com/image/fetch/$s_!wJkk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa4cda8a-786d-494b-9b6d-bcb6ccc5d0bf_870x610.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We know that this sequence of bytes is the primary &#8220;payload&#8221;&#8212;the actual program code. Everything else in the file is used to either instruct the loader how to correctly prepare a process for this code to execute, or to associate various metadata with the code.</p><p>For example, if you scan around the file, you&#8217;ll find the full path to the debug information file (PDB) for the executable image.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BwKz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BwKz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 424w, https://substackcdn.com/image/fetch/$s_!BwKz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 848w, https://substackcdn.com/image/fetch/$s_!BwKz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 1272w, https://substackcdn.com/image/fetch/$s_!BwKz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BwKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png" width="869" height="568" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86b44674-2b9d-44ad-9945-36916129bf95_869x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:568,&quot;width&quot;:869,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69255,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BwKz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 424w, https://substackcdn.com/image/fetch/$s_!BwKz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 848w, https://substackcdn.com/image/fetch/$s_!BwKz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 1272w, https://substackcdn.com/image/fetch/$s_!BwKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b44674-2b9d-44ad-9945-36916129bf95_869x568.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The executable image also must store data to which code refers. We can see this by inserting a recognizable pattern into a global variable:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;84aec97c-e7f9-4b4a-8271-7aa4dc0e1345&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// sample.c

static char important_data[] = {0x12, 0x34, 0x56, 0x78, 0x90};

void WinMainCRTStartup(void)
{
  int x = important_data[0];
}</code></pre></div><p>We can also easily find the corresponding data in the PE file:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NuZM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NuZM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 424w, https://substackcdn.com/image/fetch/$s_!NuZM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 848w, https://substackcdn.com/image/fetch/$s_!NuZM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 1272w, https://substackcdn.com/image/fetch/$s_!NuZM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NuZM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png" width="896" height="310" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:310,&quot;width&quot;:896,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17650,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NuZM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 424w, https://substackcdn.com/image/fetch/$s_!NuZM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 848w, https://substackcdn.com/image/fetch/$s_!NuZM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 1272w, https://substackcdn.com/image/fetch/$s_!NuZM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f5a56f-c105-4e90-a39b-16f5fad3fddd_896x310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you investigate formats like PE or ELF more closely, what you&#8217;ll find is that various categories of data&#8212;code, initialized global variables&#8212;are separated into <em>sections</em>. Each section has a name, which is also encoded in the file.</p><p>In PE, <code>.text</code>, for example, encodes all of the machine code (rather than, well, text&#8230;). <code>.data</code> stores data for initialized global variables. <code>.rdata</code> stores the same, but is separated to be allocated in read-only pages, such that code cannot modify that data.</p><p><code>.pdata</code> and <code>.xdata</code> encode information about how, given a procedure, one may <em>unwind</em> a thread, to&#8212;for example&#8212;produce a call stack, which is simply recreating the information of which functions called which other functions in order to get a thread of execution to its current point in a procedure. But we&#8217;ll dig into that topic in a later post.</p><p><code>.edata</code> and <code>.idata</code> encode information about <em>exports</em> and <em>imports</em>, respectively, which associate strings (&#8220;symbol names&#8221;) with locations in the file. An export is used by DLLs, for example, to export functions which can be dynamically loaded by name, by code in an executable or other DLL, and called. An import is used by either executables or DLLs to specify functions from other modules with which it must dynamically &#8220;link&#8221;.</p><p>When implementing a debugger, the precise details of formats like PE and ELF become relevant&#8212;but this should be a sufficient introduction for those unfamiliar with the basics.</p><div><hr></div><h3>Loaders &amp; Modules</h3><blockquote><p><em>A <strong>loader</strong> &#8212; The part of an operating system responsible for parsing executable images&#8212;blueprints for execution&#8212;and instantiating them, so that the code encoded in the images may be actually executed.</em></p><p>A <em><strong>module</strong></em> &#8212; The loaded equivalent of an <em>executable image</em>. One <em>process</em> can load several <em>modules</em>, although a process is always initialized by the loading of <em>one specific module</em> (the initial executable image). Modules can be both dynamically <em>loaded</em> and <em>unloaded</em>.</p></blockquote><p>More than a debugger, a <em>loader</em> must be highly aware of executable image format details, because it has the task of <em>parsing</em> those images and making preparations,<em> </em>such that the code contained in the executable image can be executed.</p><p>A loader executes when a program is initially launched, or when actively-executing code requests to dynamically load another image&#8212;for instance, via <code>LoadLibrary</code> (Windows) or <code>dlopen</code> (Linux).</p><p>To understand this, let&#8217;s build a toy executable image format, and write our own loader, which parses <em>our </em>format, rather than PE or ELF.</p><p>Consider the following code, from earlier:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;6bde3483-7f30-47c9-bec1-e6790d3fbb17&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// sample.c

void WinMainCRTStartup(void)
{
  int x = 0;
}</code></pre></div><p>And its disassembly:</p><pre><code>&gt; c:/devel/sample/sample.c
&gt; {
{48 83 ec 18}              sub rsp, 0x18
&gt; int x = 0;
{c7 04 24 00 00 00 00}     mov dword ptr [rsp], 0x00
&gt; }
{48 83 c4 18}              add rsp, 0x18
{c3}                       ret</code></pre><p>Our toy format can have a simple header, at the beginning of the image, containing the following values, in order:</p><ul><li><p>An 8-byte signature, denoting that the file is in our format&#8212;must always be <code>54 4f 59 45 58 45 00 00</code>&#8212;encoding the ASCII text <code>TOYEXE</code>, followed by two zero bytes.</p></li><li><p>An 8-byte offset into the file, encoding where in the file all readable-and-writable global data is stored&#8212;the &#8220;data section&#8221;</p></li><li><p>An 8-byte offset into the file, encoding where in the file all read-only global data is stored&#8212;the &#8220;read-only data section&#8221;</p></li><li><p>An 8-byte offset into the file, encoding where in the file all executable data is stored&#8212; the &#8220;code section&#8221;</p></li></ul><p>Each section size is determined by taking the next subsequent section offset (or the file size, in the case of the final section), and subtracting from it the section offset. If sections contain no data in any case, they will simply have the same offset as the next section.</p><p>Given this simple format, our full executable file for the simple example program can be encoded with the following bytes:</p><pre><code>{54 4f 59 45 58 45 00 00} (magic)
{20 00 00 00 00 00 00 00} (read/write data offset)
{20 00 00 00 00 00 00 00} (read-only data offset)
{20 00 00 00 00 00 00 00} (executable data offset)
{48 83 ec 18 c7 04 24 00 00 00 00 48 83 c4 18 c3} (executable data)</code></pre><p>In this case, our data sections are completely empty, because no global data is used by the code. Every section offset begins at offset <code>0x20</code> (or 32 bytes) into the file&#8212;or, directly after the header. The executable data section, being the final section, occupies the remainder of the file.</p><p>Our &#8220;loader&#8221; can define the format&#8217;s header with the following structure:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;31c63a2c-ad8d-4928-870f-f6ca0731d386&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">typedef struct ToyExe_Header ToyExe_Header;
struct ToyExe_Header
{
  U64 magic;       // must be {54 4f 59 45 58 45 00 00}
  U64 rw_data_off; // read/write
  U64 r_data_off;  // read
  U64 x_data_off;  // executable
};</code></pre></div><p>It can begin by reading the file, and extracting the header:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;add5703a-8f01-4cfd-af7d-64a68cfd6a92&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// open file, map it into the process address space
HANDLE file = CreateFileA(arguments, GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0);
U64 file_size = 0;
if(file != INVALID_HANDLE_VALUE)
{
  DWORD file_size_hi = 0;
  DWORD file_size_lo = GetFileSize(file, &amp;file_size_hi);
  file_size = (((U64)file_size_hi) &lt;&lt; 32) | (U64)file_size_lo;
}
HANDLE file_map = CreateFileMappingA(file, 0, PAGE_EXECUTE_READ, 0, 0, 0);
void *file_base = MapViewOfFile(file_map, FILE_MAP_ALL_ACCESS, 0, 0, 0);

// extract the header
ToyExe_Header header_stub = {0};
ToyExe_Header *header = &amp;header_stub;
if(file_base &amp;&amp; file_size &gt;= sizeof(*header))
{
  header = (ToyExe_Header *)file_base;
}  </code></pre></div><p>It can then allocate memory, big enough for the image&#8217;s data, and copy the file&#8217;s contents into that address range.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;20830b35-573a-4236-827b-b09d0f9857b9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// allocate memory for all executable data - ensure it is all
// writeable, executable, and readable
void *exe_data = VirtualAlloc(0, file_size, MEM_RESERVE|MEM_COMMIT, 
                              PAGE_EXECUTE_READWRITE);

// copy file's data into memory
CopyMemory(exe_data, file_base, file_size);</code></pre></div><p>Given the header&#8217;s information encoding where in the executable data the code is stored, we can now call into that code directly:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;7e0c26a3-dd00-45ba-9f24-dbd9e2277ba9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// call the code
void *x_data = (U8 *)exe_data + header-&gt;x_data_off;
((void (*)())x_data)();</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KQH9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KQH9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 424w, https://substackcdn.com/image/fetch/$s_!KQH9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 848w, https://substackcdn.com/image/fetch/$s_!KQH9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 1272w, https://substackcdn.com/image/fetch/$s_!KQH9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KQH9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png" width="1350" height="726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:1350,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95139,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KQH9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 424w, https://substackcdn.com/image/fetch/$s_!KQH9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 848w, https://substackcdn.com/image/fetch/$s_!KQH9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 1272w, https://substackcdn.com/image/fetch/$s_!KQH9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13a85b2-3f22-4989-942b-521f5cb892dd_1350x726.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And it actually works! But there is, as you might expect, more minutiae to this in practice.</p><h4>Per-Section Memory Protections</h4><p>In this example, I&#8217;ve allocated all of the executable&#8217;s data with identical <em>memory protections</em>&#8212;all bytes in the executable&#8217;s data are legal to read, write, <em>and</em> execute. The point of having different sections at all is to organize data by how it will be accessed and used, so that&#8212;for instance&#8212;our &#8220;read-only data section&#8221; can actually be read-only (such that, if any code were to attempt writing to it, it would fail).</p><p>Because memory protections are assigned at <em>page granularity</em>, each individual section, after it&#8217;s loaded by our toy loader, must be <em>at least</em> one page size (so that we can assign appropriate protections to each section), and it must be aligned to page boundaries. But, were we actually designing a format, to require all sections be at least one page size (which is normally 4 kilobytes, if not larger), at least <em>in the executable image itself </em>(as it&#8217;s stored in the filesystem), can be fairly wasteful for smaller executables.</p><p>Instead of our loaded image being a flat copy from the image file:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AzV8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AzV8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 424w, https://substackcdn.com/image/fetch/$s_!AzV8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 848w, https://substackcdn.com/image/fetch/$s_!AzV8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 1272w, https://substackcdn.com/image/fetch/$s_!AzV8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AzV8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png" width="364" height="411.84976525821594" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1446,&quot;width&quot;:1278,&quot;resizeWidth&quot;:364,&quot;bytes&quot;:812416,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AzV8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 424w, https://substackcdn.com/image/fetch/$s_!AzV8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 848w, https://substackcdn.com/image/fetch/$s_!AzV8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 1272w, https://substackcdn.com/image/fetch/$s_!AzV8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1a1b1e4-4a94-4d55-9775-ce0fc7c84407_1278x1446.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can adjust it to being an <em>expansion</em> for each section<em> </em>to page granularity, and a copy:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rog_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rog_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!rog_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rog_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png" width="518" height="518" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:518,&quot;bytes&quot;:1320933,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rog_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!rog_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!rog_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13ac2a39-66e1-4568-a052-b70d2ff91dc5_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To do this, we can introduce a distinction between <em>unloaded</em> sections (that which stored in an executable image), and <em>loaded </em>sections (that which are loaded in memory, when a process executes). So far, our toy format has one notion of &#8220;offset&#8221;. We can break that down into two notions of offset, into two separate spaces&#8212;&#8221;unloaded space&#8221; and &#8220;loaded space&#8221;. These are generally called &#8220;file space&#8221; and &#8220;virtual space&#8221; (where &#8220;virtual&#8221; refers to a process&#8217; &#8220;virtual address space&#8221;). Thus, instead of one type of offset, we can have <em>file offsets</em>, or <em>virtual offsets</em>. In code, instead of using <code>off</code> as our naming convention, we can explicitly encode which space we&#8217;re working within, by prefixing a name with either <code>f</code> or <code>v</code>. For example, &#8220;offsets&#8221; can now be referred to as either <code>foff</code> for file offsets, or <code>voff</code> for virtual offsets.</p><p>This distinction of <em>unloaded</em> and <em>loaded</em> images is the reason for the separation between the terms <em>image</em> and <em>module</em>. We call the <em>image</em> the &#8220;cold&#8221; equivalent of the data, and we call the <em>module</em> the &#8220;hot&#8221;&#8212;the loaded&#8212;equivalent of the data.</p><p>We can rewrite our header structure as follows, to encode both the locations of section data within the image, <em>and</em> to encode where the section data should be arranged <em>within memory</em> before execution:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;e139a0c6-60c9-4f55-9afe-b83fc3f6ae30&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">typedef struct ToyExe_Header ToyExe_Header;
struct ToyExe_Header
{
  U64 magic;   // must be {54 4f 59 45 58 45 00 00}
  U64 padding; // (round up to 64 bytes)
  U64 rw_foff; // read/write (file)
  U64 r_foff;  // read (file)
  U64 x_foff;  // executable (file)
  U64 rw_voff; // read/write (virtual)
  U64 r_voff;  // read (virtual)
  U64 x_voff;  // executable (virtual)
};</code></pre></div><p>Our test program can then be adjusted to the following bytes, assuming 4 kilobyte pages:</p><pre><code><code>{54 4f 59 45 58 45 00 00} (magic)
{00 00 00 00 00 00 00 00} (padding)
{20 00 00 00 00 00 00 00} (read/write data file offset)
{20 00 00 00 00 00 00 00} (read-only data file offset)
{20 00 00 00 00 00 00 00} (executable data file offset)
{00 10 00 00 00 00 00 00} (read/write data virtual offset)
{00 10 00 00 00 00 00 00} (read-only data virtual offset)
{00 10 00 00 00 00 00 00} (executable data virtual offset)
{48 83 ec 18 c7 04 24 00 00 00 00 48 83 c4 18 c3} (executable data)</code></code></pre><p>And our loader can be adjusted to perform the &#8220;expansionary copy&#8221;:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;936febf6-dca1-4a9d-a91e-7aa07b2dcea9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// unpack f/v dimensions of each section (and header)
U64 fdata_hdr_size = sizeof(*header);
U64 fdata_rw_size  = header-&gt;r_foff - header-&gt;rw_foff;
U64 fdata_r_size   = header-&gt;x_foff - header-&gt;r_foff;
U64 fdata_x_size   = file_size - header-&gt;x_foff;
U64 vdata_hdr_size = fdata_hdr_size;
U64 vdata_rw_size  = header-&gt;r_voff - header-&gt;rw_voff;
U64 vdata_r_size   = header-&gt;x_voff - header-&gt;r_voff;
U64 vdata_x_size   = fdata_x_size;

// round up virtual sizes to 4K boundaries
vdata_hdr_size+= 4095;
vdata_rw_size += 4095;
vdata_r_size  += 4095;
vdata_x_size  += 4095;
vdata_hdr_size-= vdata_hdr_size%4096;
vdata_rw_size -= vdata_rw_size%4096;
vdata_r_size  -= vdata_r_size%4096;
vdata_x_size  -= vdata_x_size%4096;

// calculate total needed virtual size, allocate
U64 vdata_size = (vdata_hdr_size + vdata_rw_size + vdata_r_size + vdata_x_size);
U8 *vdata = (U8 *)VirtualAlloc(0, vdata_size, MEM_RESERVE|MEM_COMMIT, 
                               PAGE_READWRITE);

// unpack parts of virtual data
U8 *vdata_hdr     = vdata + 0;
U8 *vdata_rw      = vdata + header-&gt;rw_voff;
U8 *vdata_r       = vdata + header-&gt;r_voff;
U8 *vdata_x       = vdata + header-&gt;x_voff;

// unpack parts of file data
U8 *fdata         = (U8 *)file_base;
U8 *fdata_hdr     = fdata + 0;
U8 *fdata_rw      = fdata + header-&gt;rw_foff;
U8 *fdata_r       = fdata + header-&gt;r_foff;
U8 *fdata_x       = fdata + header-&gt;x_foff;

// copy &amp; protect
CopyMemory(vdata_hdr, fdata_hdr, fdata_hdr_size);
CopyMemory(vdata_rw, fdata_rw, fdata_rw_size);
CopyMemory(vdata_r, fdata_r, fdata_r_size);
CopyMemory(vdata_x, fdata_x, fdata_x_size);
DWORD old_protect = 0;
VirtualProtect(vdata_hdr, vdata_hdr_size, PAGE_READONLY, &amp;old_protect);
VirtualProtect(vdata_rw, vdata_rw_size, PAGE_READWRITE, &amp;old_protect);
VirtualProtect(vdata_r, vdata_r_size, PAGE_READONLY, &amp;old_protect);
VirtualProtect(vdata_x, vdata_x_size, PAGE_EXECUTE, &amp;old_protect);</code></pre></div><p>And&#8212;since it&#8217;s easy to notice that this is getting rather repetitive for each section&#8212;we can table-drive this &#8220;expansionary copy&#8221;. Doing so will eliminate most of the per-section duplication:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;0123681a-43d9-41c8-bdcf-ee0f6ba35297&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// gather all information for all boundaries between all sections (&amp; header)
struct
{
  U64 foff;
  U64 voff;
  DWORD protect_flags;
}
boundaries[] =
{
  {0,               0,               PAGE_READONLY},
  {header-&gt;rw_foff, header-&gt;rw_voff, PAGE_READWRITE},
  {header-&gt;r_foff,  header-&gt;r_voff,  PAGE_READONLY},
  {header-&gt;x_foff,  header-&gt;x_voff,  PAGE_EXECUTE},
  {file_size,       0,               PAGE_READONLY},
};
U64 region_count = (sizeof(boundaries)/sizeof(boundaries[0]) - 1);

// calculate vsize for all regions
U64 vdata_size = 0;
for(U64 idx = 0; idx &lt; region_count; idx += 1)
{
  U64 vsize = (boundaries[idx+1].foff - boundaries[idx].foff);
  vsize += 4095;
  vsize -= vsize%4096;
  vdata_size += vsize;
}
boundaries[region_count].voff = vdata_size;

// allocate; iterate regions, do copy &amp; protect
U8 *vdata = (U8 *)VirtualAlloc(0, vdata_size, MEM_RESERVE|MEM_COMMIT, 
                               PAGE_READWRITE);
U8 *fdata = (U8 *)file_data;
DWORD old_protect = 0;
for(U64 idx = 0; idx &lt; region_count; idx += 1)
{
  CopyMemory(vdata + boundaries[idx].voff,
             fdata + boundaries[idx].foff,
             (boundaries[idx+1].foff - boundaries[idx].foff));
  VirtualProtect(vdata + boundaries[idx].voff,
                 (boundaries[idx+1].voff - boundaries[idx].voff),
                 boundaries[idx].protect_flags, &amp;old_protect);
}</code></pre></div><p>It&#8217;s not much shorter, but all work for all expanded and copied sections has been deduplicated. In order to adjust this for a larger number of different sections, only the boundary table must change.</p><h4>Imports &amp; Exports</h4><p>Now that we have a basic structure for loading our image format, let&#8217;s consider how a loader is used. As I&#8217;ve stated, an executable image is loaded whenever a program is <em>launched</em>, or when an executing program requests to dynamically load another image (for instance, through <code>LoadLibrary</code> or <code>dlopen</code>).</p><p>We&#8217;ve already explored the first case&#8212;program launching&#8212;as that will consist of simply beginning execution of the program. With our toy loader, we can just immediately execute the code after we&#8217;ve loaded it:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;8380b7da-48ef-42b6-8996-6e1621d4bed4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">// call the code
void *vdata_x = (U8 *)vdata + header-&gt;x_voff;
((void (*)())vdata_x)();</code></pre></div><p>This makes the assumption that the first instruction stored is our entry point. If we ever wanted that to <em>not be the case</em>&#8212;as &#8220;real&#8221; executable image formats do&#8212;then we can simply store a virtual offset for the desired entry point within the image&#8217;s header.</p><p>But in the case of <em>dynamic loading</em>, our loader&#8217;s job is not to merely begin executing at a single point in some code. Our load instead must load the image, and prepare for dynamic lookups of potentially many named entry points. On Windows, the usage code for this looks something like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;9632c83e-5302-43a3-b4ea-7fc0f0158f71&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">HMODULE foo_library = LoadLibraryA("foo.dll");
void (*foo_function)(void) = GetProcAddress(foo_library, "foo_function");
foo_function();</code></pre></div><p>To facilitate this path, our executable image format must associate a number of names&#8212;like <code>foo_function</code>&#8212;with specific virtual offsets in the executable data section. This concept is known as an executable image&#8217;s <em>exports</em>, and it can be straightforwardly encoded as a set of pairs of names and virtual offsets.</p><p>There&#8217;s a symmetric concept known as <em>imports</em>, which function as a fast path for the manual lookup of functions from a loaded executable image like the above code. On Windows, the usage code for <em>that</em> looks something like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;c&quot;,&quot;nodeId&quot;:&quot;70e555e7-2139-4616-b68e-5fb34da7aa52&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-c">__declspec(dllimport) void foo_function(void);
#pragma comment(lib, "foo.lib")
foo_function();</code></pre></div><p>Both <em>explicitly</em> loaded (via <code>GetProcAddress</code> on Windows, or <code>dlsym</code> on Linux), and <em>implicitly</em> loaded (via <code>__declspec(dllimport)</code> on Windows, which is more automatic on Linux toolchains) functions are called through a double indirection. To perform an actual function call, first the CPU must follow the address of the pointer in which the loaded address is stored, <em>then</em> it can use whatever the value of that pointer as the address of the function to call.</p><p>In the above example with <em>explicit</em> loading, the address of the loaded function is stored in the explicit <code>foo_function</code> function pointer variable. In the above example with <em>implicit</em> loading, the address of the loaded function is implicitly stored in hidden state, as an implementation detail.</p><p>When an executable image has imports, in order for the loading of that image to succeed, the associated imports must be dynamically linked. This will be done automatically, as opposed to the program code manually calling&#8212;for instance&#8212;<code>LoadLibrary</code> or <code>GetProcAddress</code>.</p><h4>Address Stability &amp; Relocations</h4><p>Machine code contained in an executable image can be hardcoded to refer to specific addresses. But as you&#8217;ll notice, in our toy loader, we don&#8217;t control which address at which our <em>module</em> for an image is placed in memory. We call <code>VirtualAlloc</code> to allocate memory for our module data, and whatever address it returns, we use that. Of course, we can <em>request</em> that <code>VirtualAlloc</code> place our allocation at a specific address, but that is not necessarily guaranteed to succeed.</p><p>This means that if we, for instance, had an image with instructions which referred to a global variable&#8217;s absolute address, they would only be valid given that the image is loaded at a particular address. </p><p>In principle, a loader could guarantee a fixed virtual address <em>for a program&#8217;s initially loaded executable image. </em><a href="https://en.wikipedia.org/wiki/Address_space_layout_randomization">They don&#8217;t</a>. But in any case, that cannot generally be true, because images can be loaded or unloaded dynamically, and they are not built to be aware of which other images are loaded simultaneously. Thus, they must be dynamically arranged&#8212;each image&#8217;s code should be able to operate correctly, irrespective of where its loaded module equivalent is placed in memory.</p><p>In many cases, especially nowadays, addresses are encoded as <em>relative to some offset into code</em>, in which case they&#8217;re always valid, irrespective of which runtime address at which the module is loaded. But, nevertheless, there still exist mechanisms for code to be hardwired to refer to specific addresses. In such cases, an executable image also contains <em>relocations</em>, which encode locations within the executable image which must be <em>reencoded</em> after the base address of the loaded image is determined at runtime.</p><p>It is the loader&#8217;s job to iterate these relocations, and patch in the appropriate addresses given the only-then-available knowledge of where the image&#8217;s loaded data is actually stored.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wjuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wjuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Wjuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Wjuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Wjuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wjuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png" width="496" height="496" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:496,&quot;bytes&quot;:689644,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wjuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Wjuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Wjuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Wjuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e7b8be-92da-4f66-9800-cb0686184931_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>And Finally, A Process</h3><blockquote><p><em>An instance of a live, running program. Instantiated by the platform&#8217;s loader using the initial executable image to determine how it&#8217;s initialized, and what code is initially loaded. The granularity at which operating systems assign virtual address spaces. The container of several modules, and threads of execution.</em></p></blockquote><p>We&#8217;ve covered everything we need to sketch out a definition of a <em>process</em>&#8212;a running program.</p><p>Each <em>process</em> is the owner of some number of <em>threads</em>, and some number of <em>modules</em>. It is the owner of a single <em>virtual address space</em>.</p><p>Threads and virtual address spaces are, in a sense, <em>orthogonal concepts</em>&#8212;threads are used to virtualize CPU cores, virtual address spaces are used to virtualize physical storage&#8212;the <em>process</em> is the concept which binds them together.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9wvV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9wvV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!9wvV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!9wvV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!9wvV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9wvV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png" width="474" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:928209,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9wvV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!9wvV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!9wvV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!9wvV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65f796a4-42e3-48da-9d54-f9757e8017af_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When a program is launched, a process is created, an executable image is loaded to produce an initial module, and an initial thread is spawned.</p><p>An operating system&#8217;s scheduler then considers that process&#8217; main thread as a viable candidate for scheduling. When it&#8217;s scheduled, the program executes.</p><p>When a debugger is used to analyze another program, it does so at process granularity. It is registered by the operating system as being <em>attached to a process</em>. When a debugger is attached to a process, the operating system enables additional codepaths, which report information about that process&#8217; execution to the debugger&#8217;s process. If that information includes addresses, they&#8217;re reported as <em>virtual addresses</em>, within the address space of the process to which the debugger is attached.</p><p>But that&#8217;s enough for now! We&#8217;ll dig into exactly what kind of information an operating system reports to a debugger process, how it can do so, and how the debugger can interact with the debugged process, next time.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Demystifying Debuggers, Part 1: A Busy Intersection]]></title><description><![CDATA[An introduction to a new post series covering debugger basics.]]></description><link>https://www.dgtlgrove.com/p/demystifying-debuggers-part-1-a-busy</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/demystifying-debuggers-part-1-a-busy</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Mon, 16 Dec 2024 21:45:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1960a5b1-bafa-433f-8529-8051ae2ae8c6_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em><a href="https://www.dgtlgrove.com/p/index#%C2%A7demystifying-debuggers-series">Part 1 in a series.</a></em></p><p>Debuggers exist at the intersection of many parts of the computing ecosystem&#8212;they must contend with intricate details of kernels, compilers, linkers, programming languages, and instruction set architectures.</p><p>My familiarity with debuggers has improved my programming abilities, the utility of debuggers in my day-to-day programming, and my general knowledge of computing. Back in January, <a href="https://github.com/EpicGamesExt/raddebugger">the RAD Debugger</a>&#8212;the project I work on full-time&#8212;was open sourced to the public, to mark the start of its open alpha phase. I&#8217;ve been working on the debugger, or the technology on which it depends, for almost four years full-time now. The project has taught me an enormous number of lessons, through exposure to an enormous number of problems. There is still a lot of work to do, and so I expect it will continue to do so, for many years to come.</p><p>But perhaps most importantly, debuggers are an intricate piece of the puzzle of the design of a development platform&#8212;a future I become more interested in every day, given the undeniable decay infecting modern computing devices and their software ecosystems.</p><p>To emphasize their importance, I&#8217;d like to reflect on the name &#8220;debugger&#8221;. It is not a name I would&#8217;ve chosen, because it can give the impression that a debugger is an auxiliary, only-relevant-when-things-break tool. Of course, a debugger is <em>used to debug</em>&#8212;which is why it was named as such&#8212;but it is also enormously useful to analyze <em>working</em> code&#8217;s behavior, and to verify code&#8217;s correctness, with respect to the expectations of the code.</p><p>A good debugger provides clear and insightful visualizations into what code is doing. As such, they are also enormously useful educational tools&#8212;for beginners and experts alike&#8212;because they make what is normally opaque, visible. They provide these features by dynamically interacting with running programs&#8212;as such, they can also dynamically modify code. At the limit, this approximates (or employs) JIT-compilation and hot-reloading, making traditional compiled toolchains have much more runtime flexibility for developers.</p><p>For these reasons, &#8220;debugger&#8221; is much too special-purpose of a name for the full set of capabilities that debuggers actually provide&#8212;they offer glimpses into the lower level inner-workings of a computer. If one designed a computing system from scratch, they might not ideally be independent from the operating system itself. Instead, perhaps the same capabilities could simply be provided through first-class visualization and dynamic execution adjustment features that the operating system naturally exposes. But that is a topic for another day.</p><p>I hope this sheds light on the imbecility of Internet debates about the utility of debuggers&#8212;for example, where one might find comments like, &#8220;I don&#8217;t need debuggers, because I can just use <code>printf</code>&#8221;, or &#8220;I don&#8217;t need debuggers if I can statically guarantee correctness&#8221;. It&#8217;s akin to suggesting that someone does not benefit from vision, because they can feel their way around with a mobility cane, or read text through Braille. Even though mobility canes and Braille are obviously good inventions for people who can&#8217;t have vision, that doesn&#8217;t somehow imply that vision isn&#8217;t an obvious benefit, or that it isn&#8217;t obviously preferable. Similarly, even though logging and static verification are obviously good inventions for programs or circumstances which cannot be easily debugged at runtime, or when those things are simply preferable in context, that doesn&#8217;t somehow imply that actively visualizing the runtime execution of programs through a debugger isn&#8217;t an obvious net benefit, or that it isn&#8217;t obviously preferable in many cases. To suggest otherwise in either case is absurd. The more useful debuggers become, the shorter the iteration loop of the programmer, the more efficient software production becomes, and the more trivially that programmers can obtain true from-first-principles reasoning about their code.</p><div><hr></div><p>Given their importance for both the present and future, and their utility to myself (and thus perhaps readers), I&#8217;m writing a series explaining and documenting debugger architecture.</p><p>In this series of posts, I&#8217;ll cover the following topics:</p><ul><li><p><em><strong><a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-2-the">The Anatomy Of A Running Program</a> </strong></em>&#8212; On the concepts involved in a running program. What happens, exactly, when you double click an executable file, or launch it from the command line, and it begins to execute?</p></li><li><p><em><strong><a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-3-kernel">Debugger-Kernel Interaction</a> </strong></em>&#8212; On how kernels collect and expose information about program execution to debuggers, like &#8220;debug events&#8221;, encoding changes like thread creation &amp; destruction, dynamic module loading &amp; unloading, low level exceptions being hit by threads, and more; or like the reading &amp; writing of memory &amp; thread registers, or like the suspension and resumption of threads.</p></li><li><p><em><strong><a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-4-cpu">CPU Features &amp; Debuggers</a> </strong></em>&#8212; On the features that CPUs commonly expose for debuggers, like interruption instructions, debug registers, single-stepping mode, and more.</p></li><li><p><em><a href="https://www.dgtlgrove.com/p/demystifying-debuggers-part-5-instruction">Instruction-Level Stepping &amp; Breakpoints</a></em> &#8212; How a debugger can use kernel and CPU features to implement instruction-level stepping and breakpoints.</p></li><li><p><em><strong>Debug Info &amp; Toolchains</strong></em> &#8212; On the traditional compilation and linking pipeline, how &#8220;debug info&#8221; is produced, what it contains, and how it helps debuggers implement higher level features, which can correlate a program&#8217;s state with source code or language constructs.</p></li><li><p><em><strong>Evaluation </strong></em>&#8212; On evaluating expressions using an expression language and &#8220;location info&#8221; and &#8220;type info&#8221;&#8212;two parts of &#8220;debug info&#8221;.</p></li><li><p><em><strong>Breakpoints</strong></em> &#8212; On how &#8220;breakpoints&#8221; are implemented, from address breakpoints, symbol breakpoints, source code location breakpoints, to conditional breakpoints and processor (or data) breakpoints.</p></li><li><p><em><strong>Stepping</strong></em> &#8212; On the various &#8220;stepping&#8221; features in debuggers, from the barebones single-instruction stepping, to disassembly stepping, to source line stepping, all with variants like &#8220;step into&#8221;, &#8220;step over&#8221;, and &#8220;step out&#8221;, all while correctly handling multithreaded programs.</p></li><li><p><em><strong>Unwinding</strong></em> &#8212; On &#8220;unwinding&#8221;, which is how a debugger determines a thread&#8217;s current &#8220;call stack&#8221;, and is able to correctly evaluate values from all scopes in a call stack.</p></li><li><p><em><strong>Graphical Debugger Multithreaded Architecture</strong></em> &#8212; On the structure of a graphical debugger, which employs the aforementioned features and concepts, and exposes them through a real-time interactive interface.</p></li><li><p><em><strong>The Watch Window, &amp; General-Purpose Data Visualization</strong></em> &#8212; On the traditional &#8220;watch window&#8221; graphical debugger interface, and how it may be extended to support general-purpose data visualization.</p></li><li><p>&#8230;and anything else I stumble across while writing that I think would be appropriate to cover!</p></li></ul><p>In discussing these topics, I&#8217;ll try to abstract over platform and architectural details when possible, but I&#8217;ll base my writing on my experience from working on <a href="https://github.com/EpicGamesExt/raddebugger">the RAD Debugger</a>, which has begun its journey as a Windows, user-mode, x64 debugger (although it&#8217;s not <em>finishing</em> its journey as merely that). I&#8217;ll also use the RAD Debugger to demonstrate certain concepts and features concretely.</p><p>When I am explicitly relying on that context, I&#8217;ll do my best to state so, but I&#8217;ll also do my best to extrapolate to more generalized information when appropriate, as many of the concepts have similar if not identical analogs on other platforms, and so I feel the knowledge is quite generalizable.</p><p>I hope you&#8217;re excited to come along for the ride, and demystify debuggers for yourself!</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Programmers Are Users (Bad Performance Makes Everyone Less Efficient)]]></title><description><![CDATA[On the often-referenced notion of saving &#8220;programmer cycles&#8221; at the expense of CPU cycles.]]></description><link>https://www.dgtlgrove.com/p/programmers-are-users-bad-performance</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/programmers-are-users-bad-performance</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Fri, 06 Dec 2024 00:04:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/35905c93-a18a-45c3-bbd8-63985e35d8c5_4096x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xhP8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xhP8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!xhP8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!xhP8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!xhP8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xhP8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:239948,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dgtlgrove.com/i/152627335?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xhP8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!xhP8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!xhP8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!xhP8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18d96670-8897-40da-80be-e21e5c9a6efd_4096x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="pullquote"><p>I want the act of programming to be easier, more pleasant, and more productive for the vast majority of programmers; and I am willing to sacrifice performance in order to achieve that.</p><p>&#8212;Robert &#8220;Uncle Bob&#8221; Martin</p></div><p>There&#8217;s a pervasive idea in the software industry that&#8217;s often posed to defend abstractions and software layers which are easily (and often) used to generate tremendous amounts of bad code, in favor of programmer efficiency in generating that bad code.</p><p>When I say &#8220;bad code&#8221;, I don&#8217;t refer to what the programmer sees in his text editor. I am talking about the <em>actual code</em>&#8212;that is, what the user&#8217;s CPU executes.</p><p>The argument goes: given these abstractions or layers&#8212;maybe they&#8217;re language features, or programming paradigms&#8212;programmers can be <em>more efficient</em> in generating <em>bad code</em> which accomplishes the same <em>meaningful work</em> as the alternative case, where the programmer <em>does not</em> use these abstractions or layers, and perhaps produces <em>good code</em>, but was more inefficient in doing so. The claim&#8212;the <em>hypothesis, </em>or as it is often used, the <em>assertion&#8212;</em>is that these paradigms, abstractions, languages features, and so on, are more efficient <em>from a business perspective</em>, at the expense of efficiency <em>from a software quality perspective</em>.</p><p>I recently saw this argument manifest in practice when I (finally) watched <a href="https://www.youtube.com/watch?v=ZLxazlP7Ppo">gingerBill&#8217;s video breakdown</a> of the <a href="https://github.com/cmuratori/misc/tree/main">written discussion</a> between Casey Muratori and Robert &#8220;Uncle Bob&#8221; Martin, on the technical merits of Clean Code. It&#8217;s important to understand that &#8220;Clean Code&#8221; doesn&#8217;t refer to &#8220;clean code&#8221;. Capital-C &#8220;Clean&#8221; describes code, in the same way that &#8220;Democratic People&#8217;s Republic&#8221; describes North Korea. It is a label. It has as much information content as a person&#8217;s name. Bill correctly identifies this and says that it&#8217;d really more appropriately be called &#8220;Mr. Martin Style&#8221;.</p><p>In the video (which I recommend watching), Bill mostly doesn&#8217;t focus on the <em>technical side </em>of the discussion. And with good reason&#8212;as Bill demonstrates, Robert Martin didn&#8217;t focus on the technical side of the discussion either. Mr. Martin instead opted to debate <em>rhetorically&#8212;</em>Bill correctly likens this to being akin to a form of political maneuvering. Throughout the debate, no technical questions are satisfyingly answered by Mr. Martin&#8212;if that is what you&#8217;re looking for, you&#8217;ll be disappointed. It isn&#8217;t named &#8220;Clean&#8221; (rather than Mr. Martin Style) by accident&#8212;this convenient word choice is often used by &#8220;Clean Code&#8221; proponents to mask unfalsifiable assertions, in order to justify certain &#8220;Clean Code&#8221; guidelines, by subtly shifting between &#8220;clean&#8221; and &#8220;Clean&#8221; throughout a discussion.</p><p>Casey and Bill both did an excellent job, but in both the written discussion and the video breakdown, I kept having a recurring thought that I didn&#8217;t see directly expressed by anyone, and that&#8217;s why I&#8217;m writing this post.</p><div><hr></div><p>You could spend several careers researching programmer productivity. The interface of programming is, of course, tightly intertwined with programmer psychology, the programmer&#8217;s ability to reason, and the programmer&#8217;s ability to easily express computational effects. Not all interfaces are equally efficient for programmers to use. <a href="https://www.rfleury.com/i/70173682/the-link-between-interface-implementation-and-usage">Not all interfaces can have equally efficient implementations</a>. Barely investigating this subject leads to the conclusion that some interfaces are <em>simultaneously</em> better for the programmer, <em>and</em> better for the CPU, than others. Programmer efficiency and CPU efficiency are not always mutually exclusive&#8212;one does not necessarily need to <em>always</em> pick one.</p><p>Unfortunately, serious investigations of these topics are seemingly never carried out by &#8220;Clean Code&#8221; proponents. Claims of improved programmer efficiency, at the expense of CPU efficiency, are mere assertions.</p><p>But in this post, I want to zoom out from those investigations. I want to give &#8220;Clean Coders&#8221; the benefit of the doubt. Let&#8217;s say that we <em>are</em> operating in a space where we must trade CPU productivity for programmer productivity. Let&#8217;s say that we even have tools which we <em>know</em> trade CPU productivity for programmer productivity. Never mind the fact that these premises are never proven to hold&#8212;let&#8217;s say we find a case where they <em>do hold</em>.</p><p>I&#8217;d like to explain why, in that context, it <em>still</em> doesn&#8217;t make sense to overwhelmingly prefer the comfort and generative productivity of the programmer writing the code. Even when I&#8217;ve given &#8220;Clean Coders&#8221; everything they could possibly ask for, everything they attempt to assert, there is <em>still</em> a strong argument in favor of software performance, at the expense of initial programmer performance. Part of this argument has to do with the overall quality of the computing system, but it is not strictly altruistic&#8212;there is also a strong <em>business case</em> for software performance, even with heightened upfront programming cost. As such, this argument does not require that you have business inclinations, nor altruistic inclinations&#8212;you could love, <em>or</em> hate, the concept of a free market with private property. In all such cases, this argument still applies.</p><div><hr></div><p>The fundamental mistake I see as responsible for entirely destabilizing the argument in favor of programmer code-writing efficiency (at the expense of software efficiency) is in an incorrect model of the software ecosystem.</p><h2>Programmers Run Their Own Code</h2><p>The implicit model being used to justify the tradeoff assumes a single software producer, which produces the code, and a single customer, which executes the code:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5dt1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5dt1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 424w, https://substackcdn.com/image/fetch/$s_!5dt1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 848w, https://substackcdn.com/image/fetch/$s_!5dt1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 1272w, https://substackcdn.com/image/fetch/$s_!5dt1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5dt1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png" width="1456" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2229442,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5dt1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 424w, https://substackcdn.com/image/fetch/$s_!5dt1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 848w, https://substackcdn.com/image/fetch/$s_!5dt1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 1272w, https://substackcdn.com/image/fetch/$s_!5dt1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514b447a-30f7-4af1-a29c-ef7e87af8e53_2103x739.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But this model ignores the everyday reality of programming. The reality is that the <em>producer </em>must also execute the code, as they develop, debug, and test it. Only after they&#8217;ve done that process innumerable times, they will submit the software to the user to be executed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GF--!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GF--!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 424w, https://substackcdn.com/image/fetch/$s_!GF--!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 848w, https://substackcdn.com/image/fetch/$s_!GF--!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 1272w, https://substackcdn.com/image/fetch/$s_!GF--!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GF--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png" width="1456" height="913" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:913,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3087699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GF--!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 424w, https://substackcdn.com/image/fetch/$s_!GF--!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 848w, https://substackcdn.com/image/fetch/$s_!GF--!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 1272w, https://substackcdn.com/image/fetch/$s_!GF--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff54c158e-191b-4cce-a45a-b69c72d8cf6f_1873x1175.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Already in this model, bad software <em>execution time</em> has already caused a slowdown in programmer productivity. Some amount of the programmer&#8217;s efficiency in <em>reading or writing the code</em> has already been counterbalanced by the software&#8217;s inefficiency, because the programmer is blocked on executing his own software. The slow software execution time has thus had an impact on the programmer&#8217;s iteration cycle time.</p><p>To continue justifying the choice to prefer programming methodologies which promote <em>fast generation</em> of <em>bad code</em>, one must prove that this degradation of programmer iteration cycle time does not outweigh the improvement in programmer efficiency in generating the code. Given that code can often be written once, but tested, debugged, and tweaked hundreds if not thousands of times, this fact is important to demonstrate. But it is never demonstrated.</p><h2>Customers Are Programmers</h2><p>But this problem gets worse. Who is the software producer? What are they making? Who is the software consumer? What are they using the software for? It seems like &#8220;Clean Code&#8221; arguments are often made assuming a very narrow definition of these two entities&#8212;for example, as used in the original written discussion, some payroll software, used by a business to pay employees. But what is the <em>common reality</em> in the software world?</p><p>Let&#8217;s take an example from the real world&#8212;from my own life, in fact. I work on a <a href="https://github.com/EpicGamesExt/raddebugger">debugger project</a> full-time. As part of my work on that project, I need to use a file explorer. The Microsoft Windows built-in file explorer is an incredibly poor quality experience for a large number of reasons&#8212;probably related to &#8220;Clean Code&#8221;, in fact&#8212;which is why I&#8217;m excited for my friend Vjekoslav&#8217;s file explorer project, <a href="https://filepilot.tech/">File Pilot</a>. Vjekoslav is a programmer too, and so as part of his file explorer project work, he needs to use a debugger.</p><p>Instead of some arbitrary payroll software producer, and some arbitrary business, let&#8217;s say the entity on the left is me writing the debugger, and the entity on the right is Vjekoslav, writing his file explorer. This is a slight extension of the initial oversimplified model&#8212;it demonstrates one simple case of <em>cyclical</em> dependencies within the software ecosystem. This is an overwhelmingly <em>common</em> relationship&#8212;it regularly manifests within a single company or organization, but it extends beyond those boundaries as well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E33-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E33-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 424w, https://substackcdn.com/image/fetch/$s_!E33-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 848w, https://substackcdn.com/image/fetch/$s_!E33-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 1272w, https://substackcdn.com/image/fetch/$s_!E33-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E33-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png" width="1456" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4725148,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E33-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 424w, https://substackcdn.com/image/fetch/$s_!E33-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 848w, https://substackcdn.com/image/fetch/$s_!E33-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 1272w, https://substackcdn.com/image/fetch/$s_!E33-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04135bd3-4e73-4350-96bc-dadd216144ce_2429x1265.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this context, if my software is unnecessarily slow, <em>not only</em> does it slow myself down before I release it, it <em>also</em> slows Vjekoslav down in his development. This degradation in <em>his</em> programming performance manifests in fewer iteration cycles for <em>his </em>software, which <em>I</em> also depend on. Because I began by making <em>my </em>software worse than it could&#8217;ve been, that effect has propagated outward into the computing ecosystem, and because the computing ecosystem is a cyclical graph of dependencies, that effect propagates <em>back to me</em>.</p><h2>Slowness Propagates, Then Compounds</h2><p>But this problem is worse, still, than I&#8217;ve described. A more detailed graph of the software ecosystem is more complex than the earlier examples. The software ecosystem is a vast cyclic network of independent entities producing software, and relying on other software.</p><p>Poor performance, introduced at points in this network, propagate outwards, because &#8220;depending on&#8221; software means <em>executing</em> that software. Execution time of an entity&#8217;s dependency is, one way or another, factored into the production of the dependent itself. This may be literally execution time in the dependent software, if it relies on a poorly constructed library&#8212;or it may be in programming resources, if the programmer relies on a poorly constructed tool. Whether it blocks the <em>programmer</em>, or the <em>software</em>, it has a cost.</p><p>The longer the dependency chain becomes, the more likely these effects will be multiplied or otherwise duplicated&#8212;and the more likely that these effects will back-propagate to the dependent software. Not to mention that, the longer this dependency chain becomes, the less likely that anyone will understand what the problem truly is, or how to fix it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cg73!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cg73!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 424w, https://substackcdn.com/image/fetch/$s_!Cg73!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 848w, https://substackcdn.com/image/fetch/$s_!Cg73!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 1272w, https://substackcdn.com/image/fetch/$s_!Cg73!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cg73!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png" width="436" height="523.1401098901099" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1747,&quot;width&quot;:1456,&quot;resizeWidth&quot;:436,&quot;bytes&quot;:13620838,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cg73!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 424w, https://substackcdn.com/image/fetch/$s_!Cg73!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 848w, https://substackcdn.com/image/fetch/$s_!Cg73!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 1272w, https://substackcdn.com/image/fetch/$s_!Cg73!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8787d75b-17aa-4f70-9ade-e9b940dc4910_2721x3265.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This reality has explanatory power in describing the state of modern software. Despite vast improvements in computing hardware capabilities, some software often performs dramatically <em>worse</em> than its contemporary equivalent running several decades ago, on hardware that was sometimes a <em>thousand times slower</em>, with <em>a thousand times fewer resources</em>. This is because poor performance in this network does not stop at one edge in the network&#8212;it is multiplied several times over.</p><p>During the initial discussion, Mr. Martin introduced several time domains. He described some problems as being within the &#8220;nanoseconds domain&#8221;, where wasting nanoseconds matters.  He described the &#8220;microseconds domain&#8221;, where wasting microseconds matters.  He described the &#8220;milliseconds domain&#8221;, where wasting milliseconds matters. </p><p>The problem is, however, that there is only one time domain <em>in reality</em>. And wasted nanoseconds add up to wasted microseconds. And those add up to wasted milliseconds. And those add up to wasted seconds. This can happen <em>before</em> network effects in the software ecosystem are considered, simply within one piece of software.</p><p>After network effects are considered, it is perhaps no surprise that programs now accomplish in thirty seconds what they used to accomplish in less than a second <em>on hardware which was thousands of times less capable</em>.</p><div><hr></div><p>Spending software efficiency to obtain programmer efficiency may seem like an appealing idea. But I hope this post helped clarify why it isn&#8217;t that simple. The acceptance of software production methodologies like the one Mr. Martin himself promotes are in part responsible for a vast degradation of software quality, and ironically enough, <em>decreases</em> in programmer efficiency.</p><p>I opened this post with a quote from Mr. Martin himself: &#8220;<em>I want the act of programming to be easier, more pleasant, and more productive for the vast majority of programmers</em>&#8221;.</p><p>Unfortunately, the act of programming is <em>harder</em>, <em>less pleasant</em>, and <em>less productive</em> than ever, because of systemic excuse-making for mediocrity. Sacrificing software efficiency for programmer efficiency seems to simply result in losing both.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Codebase Walkthrough: Multi-Window, Panel-Tree UI]]></title><description><![CDATA[Building a multi-window, panel-tree UI sample in the codebase.]]></description><link>https://www.dgtlgrove.com/p/codebase-walkthrough-multi-window</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/codebase-walkthrough-multi-window</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Mon, 08 Jul 2024 06:05:29 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/146386391/c84f4ad6-b8cb-47ad-a399-d6bfbbe7cfaa/transcoded-126950.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this video, I write a sample program from scratch for the codebase, intended to demonstrate how the codebase layers (notably the UI layer, built in the style described in the <a href="https://www.rfleury.com/p/ui-series-table-of-contents">UI series</a>) can be used to set up a multi-window, per-window-panel-tree user interface.</p><p>This is a very flexible high-level interface design. It&#8217;s a useful way to structure the user interface for programs which require a large number of integrated, but varied, user interface designs, such as the <a href="https://github.com/EpicGamesExt/raddebugger">program</a> pictured below, for which I used this basic design:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MuN0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MuN0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 424w, https://substackcdn.com/image/fetch/$s_!MuN0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 848w, https://substackcdn.com/image/fetch/$s_!MuN0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!MuN0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MuN0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png" width="1456" height="847" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:847,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:372951,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MuN0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 424w, https://substackcdn.com/image/fetch/$s_!MuN0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 848w, https://substackcdn.com/image/fetch/$s_!MuN0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!MuN0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c300069-47bf-453a-a1ee-babc0771f760_1856x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>
      <p>
          <a href="https://www.dgtlgrove.com/p/codebase-walkthrough-multi-window">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Codebase Walkthrough: Using The Metaprogram]]></title><description><![CDATA[Walking through the structure and basic usage of the codebase's metaprogram, which allows for arbitrary compile-time execution, code generation, and code introspection.]]></description><link>https://www.dgtlgrove.com/p/using-the-codebase-metaprogram</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/using-the-codebase-metaprogram</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Sat, 06 Jul 2024 07:56:07 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/146333167/220f1f79-e64b-4314-af5b-14451269c745/transcoded-58274.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this video, I walk through the structure and features of the codebase metaprogram, located in the <code>metagen</code> layer. This simple program, and its trivial usage in the codebase <code>build.bat</code> file, allows for arbitrary compile-time execution, introspection over the codebase, and compile-time code or data generation. It&#8217;s a simple setup, but it provides many capabilities that many assert can only be provided by a compiler or language designer.</p><p>By inserting itself into the build, the codebase is not dependent on a compiler or language designer providing the right features&#8212;it simply has the space of compile-time execution available for its own computation.</p><p>This model has other many notable benefits, which compiler-executed metaprograms lack&#8212;the metaprogram can be debugged like a normal executable with normal debug information, and it can run as a native program at full speed; it does not need to execute within a bizarre pure-functional type language, or within a virtual execution environment, it is just regular code, which analyzes and produces text.</p>
      <p>
          <a href="https://www.dgtlgrove.com/p/using-the-codebase-metaprogram">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Enter The Arena: Simplifying Memory Management (Talk)]]></title><description><![CDATA[A talk on arena-based memory management.]]></description><link>https://www.dgtlgrove.com/p/enter-the-arena-talk</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/enter-the-arena-talk</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Mon, 13 May 2024 16:34:21 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/144590119/275387c22eca828e09146819201bfeb2.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>This is a video of a talk I did in August 2023, aiming to teach the concepts described in <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">Untangling Lifetimes: The Arena Allocator</a>, in a different format, with greater and more concrete detail.</p><p>The small sample program&#8212;along with many other projects and examples, all built using arenas as the core memory management primitive&#8212;is included in <a href="https://git.rfleury.com/community/root_basic">the Digital Grove codebase</a>, which is available to paid subscribers. You can subscribe below:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p><p>I hope it helps to better communicate some of the ideas I&#8217;ve written about. If you enjoy the talk, please consider subscribing. Thanks for watching.</p><p>-Ryan</p>]]></content:encoded></item><item><title><![CDATA[Upstream & Downstream]]></title><description><![CDATA[Separating computational cause from effect.]]></description><link>https://www.dgtlgrove.com/p/upstream-and-downstream</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/upstream-and-downstream</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Sat, 11 May 2024 17:38:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fe8ee021-f0e3-405b-9af0-78951c6f6ef3_4096x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xljl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xljl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Xljl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Xljl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Xljl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xljl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:288991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dgtlgrove.com/i/141493705?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xljl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Xljl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Xljl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Xljl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27a84a8d-22ef-4dc3-b0e7-0fd1d627d982_4096x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In <a href="https://www.rfleury.com/p/ui-part-2-build-it-every-frame-immediate">my UI series</a>, I wrote about my preference for an &#8220;immediate mode&#8221; interface for building UI. This preference came from years of attempting to build UI from scratch in various ways. When I&#8217;d use a retained mode interface, it would always feel like I was pushing sand around.</p><p>When I <a href="https://www.youtube.com/watch?v=Z1qyvQsjK5Y">discovered</a> the immediate mode approach to UI&#8212;similar in spirit to the immediate mode approach to rendering&#8212;it felt like a breath of fresh air. My UIs became not only easier to build&#8212;which is in itself not enough justification for the preference, although still important in improving iteration time&#8212;but also more robust, more easily made dynamic, and more naturally reactive to state changes in my programs.</p><p>There is a fair amount of rhetoric that suggests immediate mode interfaces are less capable than retained mode interfaces. The most common arguments are: <strong>(a)</strong> <a href="https://www.rfleury.com/i/54057344/immediate-mode-build-cache">there are caching opportunities</a> that immediate mode interfaces cannot take advantage of; <strong>(b) </strong>immediate mode interfaces must update all state for all changes, <a href="https://www.forrestthewoods.com/blog/proving-immediate-mode-guis-are-performant/">thus causing worse performance and higher battery usage</a>; <strong>(c) </strong>immediate mode interfaces prohibit the use of user interfaces <em>as a data structure</em>, thus prohibiting integration with operating system accessibility features, <a href="https://www.rfleury.com/p/ui-part-9-keyboard-and-gamepad-navigation">keyboard navigation</a>, <a href="https://www.rfleury.com/i/54057344/a-sample-offline-autolayout-algorithm">autolayout</a>, and so on.</p><p>Statements <strong>(a)</strong>, <strong>(b)</strong>, and <strong>(c) </strong>are all false. The embedded links provide compelling counterarguments to each. It is a bit tiresome that these arguments are still continuously made&#8212;<em>asserted&#8212;</em>considering the lack of investigation and thought behind them.</p><p>But despite that, there is still a point at which immediate mode interfaces lose their utility, and where retained mode interfaces become preferable, if not required.</p><p><a href="https://www.rfleury.com/p/ui-part-7-where-imgui-ends">As I&#8217;ve written about before</a>, one place where this rings true is in controlling higher level, user-controlled &#8220;interface instantiation entities&#8221;&#8212;think windows, tabs, panels, and so on. In other words, things that the user might explicitly create with a &#8220;+&#8221; button, and destroy with an &#8220;x&#8221; button.</p><p>But the same is true, for example, in <em>the implementation of</em> an immediate mode interface with persistent &#8220;key-based&#8221; caching behavior. The <em>implementation of</em> the cache cannot be immediate mode&#8212;the cache is implemented with &#8220;retained mode pieces&#8221;, they are just localized to the implementation of the immediate mode interface.</p><p>Another example would be state for entities in a game level. In a usual game scenario, entities are dynamically created and destroyed depending on various gameplay conditions. Usage code of an immediate mode interface, however, is static, as code is immutable:</p><pre><code>// every frame:
Entity("player", player_sprite, player_pos, ...);
Entity("goblin", goblin_sprite, goblin_pos, ...);
Entity("chest", chest_sprite, chest_pos, ...);</code></pre><p>In the above example, let&#8217;s say that the gameplay involves the <code>player</code> defeating the <code>goblin</code>, in order to obtain valuables from the <code>chest</code>.</p><p>There are countless possible combinations of state in this scenario, accounting for all the various entity positions, entity health levels, and so on. But one important possibility is: is the <code>goblin</code> <em>alive</em>, or is the <code>goblin</code> <em>dead</em>?</p><p>Imagine that you need to encode this possibility into usage of the &#8220;immediate mode entity interface&#8221;, used above as <code>Entity</code>. You&#8217;d need something like the following:</p><pre><code>// every frame:
Entity("player", player_sprite, player_pos, ...);
if(is_goblin_alive)
{
  Entity("goblin", goblin_sprite, goblin_pos, ...);
}
Entity("chest", chest_sprite, chest_pos, ...);</code></pre><p>And what is <code>is_goblin_alive</code>? It&#8217;s <em>usage-code-side state</em>. And usage-code-side state requires a usage-code-side <em>state machine</em>. This state machine requires retained-mode-like mutations.</p><p>In other words, it&#8217;s impossible to <em>escape</em> state machines&#8212;or &#8220;retained mode interfaces&#8221;&#8212;they are the exactly correct choice in many places.</p><p>But &#8220;many&#8221; is not &#8220;all&#8221;&#8212;and the reason why immediate mode interfaces become <em>preferable</em> in some places is because the introduction of <em>yet another state machine</em> is actually a <em>burden</em> on usage code, rather than providing some necessary functionality.</p><p>Thinking about this led me to a useful analogy&#8212;is the system I&#8217;m writing <em>upstream</em>? Or is it <em>downstream</em>?</p><p>Downstream of <em>what?</em> Downstream of <em>state machines</em>.</p><p><em>Upstream</em> systems are <em>shallow layers</em> in a call stack. They need control over state, and they want to describe how many <em>computational effects flow </em>from that state.</p><p><em>Downstream</em> systems are <em>deeper layers</em> in a call stack. They are called into to produce <em>computational effects</em> from user-provided state. Their purpose is to organize the particulars of how some system <em>functionally derives</em> from another.</p><p>When a <em>downstream system</em> introduces <em>new state machines</em> for usage code, when those state machines merely <em>mirror</em> an <em>upstream state machine</em>, this introduces additional cruft, additional busywork for usage code, possibility for bugs, worse iteration time, and a substantially worse programming experience. This is what happens when a &#8220;retained mode system&#8221; is inappropriately introduced.</p><p>One important reason why is because there are <em>far</em> <em>more</em>&#8212;exponentially more&#8212;<em>downstream</em> system entry points than <em>upstream, </em>because every entry point can itself call into N other entry points. Thus, introducing usage-code-controlled state machines unnecessarily in downstream systems <em>explodes</em> the number of possibilities the usage code author must actually be concerned about.</p><p>This explains why, over time, it has become obvious that, for rendering, immediate mode interfaces are dramatically better than retained mode interfaces. This was not always the case, but this model makes it clear why&#8212;on-screen artifacts produced by rendering are exactly that: artifacts. They&#8217;re a function of some invisible state&#8212;the entire purpose of rendering is to take some invisible state in the machine, and turn it into something visible.</p><p>UI is only one step further&#8212;instead of merely visual artifacts as &#8220;output&#8221; of a system, it is also concerned with user inputs as &#8220;input&#8221; to a system. But it still, nevertheless, remains largely <em>downstream</em> of that system.</p><p>But immediate mode interfaces&#8212;to specify &#8220;downstream effects&#8221;&#8212;is a technique far from localized to rendering and UI.</p><p>And when truly downstream systems are correctly organized as such&#8212;providing an immediate mode interface to usage code&#8212;it becomes trivial to dynamically produce a multitude of possible <em>downstream</em> effects, with few <em>upstream </em>state changes. Every upstream state change becomes much more powerful and meaningful&#8212;it costs little code, little computation, and little work to produce a new world of effects. There is no need to <em>inform</em> every downstream system of the state change.</p><p>These reasons are why I&#8217;ve found the analogy useful, and why I wanted to share it. When writing a system, ask yourself&#8212;are you <em>downstream</em>, or are you <em>upstream</em>? The answer to that question can help inform you about the appropriate design for that system.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Multi-Threading & Mutation ]]></title><description><![CDATA[On mutation, how it subtly occurs in single-threaded code, and how it can disrupt the process of upgrading single-threaded code to being multi-threaded.]]></description><link>https://www.dgtlgrove.com/p/multi-threading-and-mutation</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/multi-threading-and-mutation</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Tue, 12 Mar 2024 01:40:49 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/96e409ed-280f-4871-9c5f-94a2dd8d02ab_4096x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NzX8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NzX8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!NzX8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!NzX8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!NzX8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NzX8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:245081,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dgtlgrove.com/i/142014307?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NzX8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!NzX8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!NzX8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!NzX8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a893197-5018-4bf5-91a0-07bbf4ededf0_4096x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On day one, to even begin programming anything, programmers must understand how to write text files, and use an interpreter, or compiler, or assembler, to turn them into executing programs. Soon after, they must roughly understand what memory is, and how to use addresses and pointers (even if they&#8217;re working in various high level languages in which those concepts masquerade under other names). This naturally follows into a basic understanding of memory allocation. An important next step is learning to write code to operate on batches of data, rather than single elements of data, for both performance and simplicity. They can then understand how to <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">more simply manage memory</a>, by being organized about these batches, and performing the allocations themselves as a batch, upfront. They can become more familiar with the underlying mechanisms of the CPU, how it obtains memory, performs computations, and writes memory.</p><p>At each of these milestones, the programmer&#8217;s understanding of the underlying hardware, and how to gracefully use it, broadens. And because modern hardware offers several independently-executing processor units, one of the next most crucial milestones in this journey is writing code to execute <em>in</em> <em>parallel</em>, in multiple simultaneously-executing instruction streams, rather than just a single instruction stream. This is critical for both software performance <em>and</em> utility. When writing user-mode code on a multitasking operating system, like myself and most developers, this is done with multi-threading.</p><p>Many of these milestones introduce new constraints which apply pressure on code to be shaped in a particular way. If a programmer solves for only <em>some of these constraints</em>, it&#8217;s like building a piece of furniture and tightening one particular screw fully, before even placing all other screws. Code, much like <a href="https://www.rfleury.com/p/you-get-what-you-measure">many other things</a>, is partly a function of its environment&#8212;that is, the constraints to which it&#8217;s regularly exposed.</p><p>Multi-threading is one such milestone. Single-threaded code can be built with a different set of constraints than multi-threaded code, and as such, it is often extremely difficult to later add the &#8220;screw&#8221; of multithreading to a mostly-fully-built piece of &#8220;furniture&#8221;. In other words, if code needs to be multi-threaded, then it should be built under those constraints <em>as early as possible</em>; more and more of it will need to be rewritten the <em>later those constraints are introduced</em>.</p><p>Sometimes, it makes more sense to willingly accept the cost of rewriting, especially if it&#8217;s difficult to predict what the correct multi-threaded design for a problem will be, and a single-threaded version must be first written to explore the problem space. I think there is often too much reluctance toward rewriting systems, and too many resources put into &#8220;reusability&#8221; (before the exact circumstances of that reuse are fully understood), but I digress.</p><p>For me, multi-threading was initially an extremely intimidating milestone to tackle, because it introduces <em>so many</em> of these additional constraints. It seems to add an entirely new dimension to programming, and many techniques one learns to work adequately in single-threaded space suddenly become invalid in multi-threaded space.</p><p>But as I&#8217;ve learned over the past couple of years, this doesn&#8217;t mean that, at the end of the day, it always needs to be <em>difficult</em>. It can instead be quite easy, after internalizing these additional constraints, and designing systems accordingly. Designing such systems can obviously be tricky and dependent on one&#8217;s own ingenuity, but multi-threading being intimidating was more a function of its <em>opacity</em> rather than its <em>intrinsic difficulty</em>. In other words, hard problems are still hard, but when I was first thinking about multithreading, there were many problems that <em>seemed like they should be easy</em> that were also hard.</p><p>Anytime someone writes a shader, for example, they&#8217;re doing multi-threaded programming. <a href="https://www.shadertoy.com/">Shadertoy</a> proves that this can be simple for something like tens (if not hundreds) of thousands of programmers, and it mostly happens invisibly, because the underlying primitives are well-designed. I learned that I can also design underlying primitives, for a given problem, to make multithreaded programming simple.</p><p>I&#8217;ve <a href="https://www.rfleury.com/p/a-taxonomy-of-computation-shapes">written before</a> about some lessons I&#8217;ve learned which have helped me internalize these additional constraints, and design multi-threaded systems. Recently, I&#8217;ve been doing much more multi-threaded programming, and I want to share more of those lessons in this post.</p><div><hr></div><h2>It&#8217;s Not How You Synchronize&#8212;It&#8217;s How You Don&#8217;t</h2><p>Generally, when someone is being taught multithreaded programming, they are taught about synchronization mechanisms&#8212;for instance, atomic CPU operations, and the higher-level abstractions that they are often used to implement, like mutexes, condition variables, semaphores, and so on. And while understanding synchronization mechanisms is important, the true task of writing multithreaded systems is not figuring out how to synchronize, it&#8217;s how to <em>not synchronize</em>. One of the major benefits of multi-threaded code is, obviously, that it allows multiple tasks to happen simultaneously. The more synchronization a multithreaded system requires, the more this benefit dissolves. Of course, <em>some amount</em> of synchronization must exist in order for two systems to work together, and <em>at those synchronization points</em>, getting the tricky details of synchronization mechanisms is critical&#8212;but ideally, the vast majority of multi-threaded code is executing <em>without synchronization at all</em>.</p><p>One intrinsic characteristic of multi-threaded code is that simultaneous reads can happen without any synchronization. Writes, on the other hand, can require synchronization (with both other writes <em>and</em> reads)&#8212;for one touched region in memory, it is really only ever well-defined to have a single write at a time, and thus there must be some synchronized order of writes, and reads cannot occur <em>while</em> a write occurs. This means either <em>reads</em> or <em>non-conflicting writes </em>lend themselves better to multi-threaded code than <em>potentially-conflicting writes</em>.</p><p>Thus, designing multi-threaded systems which execute cleanly in parallel requires careful attention to where <em>reads</em> occur, where <em>writes</em>&#8212;<em>mutations&#8212;</em>occur, and <em>what is mutated</em>. Writes should not conflict with other threads also performing writes as often as possible, lest the system requires additional synchronization. In other words, the mutations performed by each thread should be <em>bucketed</em> whenever possible.</p><p>A simple example of <em>bucketed mutations</em> would be when many threads allocate and mutate a local variable on the stack. This is trivially conflict-free and thus requires no synchronization, because the term &#8220;the stack&#8221; actually refers to <em>a per-thread stack</em>, and thus each thread mutates a completely different region in memory:</p><pre><code>void ThreadEntryPoint(...)
{
  U64 x = 0;
  for(U64 idx = 0; idx &lt; 1000; idx += 1)
  {
    x += idx; // will *never* conflict with other threads
  }
}

void EntryPoint(...)
{
  LaunchThread(ThreadEntryPoint, ...);
  LaunchThread(ThreadEntryPoint, ...);
}</code></pre><p>And a simple example of <em>non-bucketed mutations</em> would be when many threads write into the same fixed address in memory, e.g. through a <code>static</code> variable:</p><pre><code><code>void ThreadEntryPoint(...)
{
  static U64 x = 0;
  for(U64 idx = 0; idx &lt; 1000; idx += 1)
  {
    x += idx; // will *always* conflict with other threads
  }
}

void EntryPoint(...)
{
  LaunchThread(ThreadEntryPoint, ...);
  LaunchThread(ThreadEntryPoint, ...);
}</code></code></pre><p>In the above example, the conflicting write&#8212;the non-bucketed mutation&#8212;is not properly synchronized, and so the value of <code>x</code> is not reliably well-defined, and thus of much use.</p><div><hr></div><h2>Bucketing Allocations &amp; Arenas</h2><p>A common example of mutation is mutating some data to allocate memory. Because allocations are a subset of mutations, they have the same characteristics in multi-threaded systems that other mutations do. So <em>bucketing mutations</em> wherever possible implies the rule to <em>bucket allocations</em> wherever possible.</p><p>Bucketing allocations for multi-threaded purposes often leads naturally to many opportunities to bucket allocations <em>by lifetime</em>. As such, the <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">arena allocator</a> is an excellent tool for bucketing allocations.</p><p>Many people have asked me about synchronization mechanisms being baked into arena allocators, such that the lowest level arena-based allocation operation&#8212;<code>ArenaPush</code>&#8212;is thread-safe. This is an understandable question, because other allocators&#8212;e.g. <code>malloc</code>&#8212;are designed to be callable from several threads at once.</p><p>But making the basic arena operations thread-safe is putting the cart before the horse, and my reasoning for preferring <em>read-only access</em> and <em>bucketed mutations</em> in multi-threaded systems hopefully clears up why&#8212;it is <em>assuming</em> that the correct architecture requires synchronized access to the same arena. But the much more preferable alternative, when it is indeed possible for a given problem, is to use each arena <em>as a per-thread bucketed allocation</em> (and thus <em>bucketed mutation</em>) mechanism, such that only one thread trivially accesses an arena at a time with no synchronization.</p><p>An arena can be, of course, used <em>along with </em>synchronization mechanisms, such that an arena is not per-thread but rather per-data-structure, or per-hash-table-stripe, and so on, but baking in that synchronization <em>at the lowest level</em> is a misunderstanding of the ideal multi-threaded system, where threads execute <em>almost entirely</em> without synchronization (and thus almost entirely with read-only access to shared regions, or non-conflicting reads and writes). Using synchronization mechanisms with an arena is a compromise on that ideal. This is perfectly acceptable in many circumstances, since it may be for a <em>different </em>ideal (e.g. storing a heavy resource once, saving both memory and compute time, thus requiring synchronized access to a shared cache), but in any case, the arena implementation is <em>more</em> <em>flexible</em>, and functions well for <em>both cases</em>, when synchronization is user-defined, often resulting in no synchronization whatsoever.</p><div><hr></div><h2>Implicit Mutations &amp; Work Independence</h2><p>In single-threaded space, it is often convenient or useful to combine <em>several codepaths</em> and produce <em><a href="https://www.rfleury.com/i/112467756/effective-codepaths-vs-codepaths">one effective codepath</a></em>. But sometimes, this is done such that <em>one low-level codepath</em> in an <em>effective codepath </em>contains a mutation, and <em>another low-level codepath</em> in the same <em>effective codepath</em> contains only a read. Take the following example, from a previous post:</p><pre><code>// retrieves value associated with `key` - if it does not
// exist inside `table`, then inserts it with initial value
// of `default_val`
Val ValFromKey(Table *table, Key *key, Val default_val);

Key key = ...;
Val val = ValFromKey(table, &amp;key, default_val);</code></pre><p>In that previous post, I wrote about how this kind of immediate-mode API can be extremely useful in collapsing the number of low-level codepaths required in a specific problem, particularly the number of low-level codepaths responsible for maintaining complex state machines.</p><p>But in multi-threaded systems, this kind of API requires extra analysis. The <em>effective codepath</em> must be treated as having the mutational properties of <em>all</em> of its potential <em>low-level codepaths</em>. Thus, the effective codepath which calls <code>ValFromKey</code> can only be understood <em>as mutational</em> of its <code>table</code> argument. If that table is a shared cache, perhaps shared across threads, then it has potentially conflicting mutations, thus requiring synchronization.</p><p>As I discussed earlier, this may be a perfectly suitable design&#8212;for example, if I replaced <code>Val</code> with <code>TextureHandle</code>, and <code>ValFromKey</code> with <code>TextureFromKey</code>, and this API was used to do quick lookups into a shared texture cache given a key, and I expect that <em>mostly</em> to consist of several in-flight reads after textures are loaded (and thus not requiring synchronization with writes), then synchronization is a perfectly reasonable trade.</p><p>But the details can change. Suppose that this effective codepath is used as a helper mechanism in <em>two</em> other overarching codepaths. In the first case, the call to <code>ValFromKey</code> is <em>mutational</em> in 99% of cases. In the second task, the call to <code>ValFromKey</code> is 100% non-mutational, and thus only doing a read-only lookup.</p><p>This manifested recently on a project I work on, where a <code>ValFromKey</code> mechanism was being used to build a large deduplicated hash table of strings, and the same mechanism was later being used to look up nodes in that hash table and gather information from them. The system I was working on was originally written as single-threaded code, and I was doing a pass over the system to improve its performance, particularly by moving independent streams of work to execute simultaneously, with no synchronization, on multiple threads.</p><p>Of course, in the single-threaded context, this code functions perfectly correctly&#8212;but by blurring the line between mutational writes and read-only access, it gave both a 99% mutational low-level codepath and a 100% read-only codepath the same name. Of course, if I verified that indeed the latter case was 100% read-only, I could simply <em>call it anyways</em> as a read-only effective codepath, but this is awfully close to &#8220;cheating&#8221;&#8212;it&#8217;d only take a small number of completely innocent changes to invisibly break the system with new race conditions, and so I found it much more reasonable to explicitly separate the &#8220;mutate explicitly&#8221; and &#8220;read-only&#8221; codepaths, and use each accordingly.</p><p>After doing so, because all allocations in the system are explicit to callers with <code>Arena</code> parameters, I instead have an API like the following:</p><pre><code><code>void TableInsert(Arena *arena, Table *table, Key *key, Val *val);
Val *TableLookup(Table *table, Key *key);</code></code></pre><p>With this style of API, the caller is in control of the allocation bucket via the <code>Arena</code> parameter. As such, the fact that <code>TableLookup</code> is read-only is explicit (as there is no such parameter), and thus guaranteed to function correctly in parallel with other codepaths also executing <code>TableLookup</code>.</p><p>In the aforementioned problem, that allowed the <em>second</em> codepath to be massively improved in performance&#8212;the previously single-threaded code doing 100% read-only lookups into a table could now be reorganized to execute completely in parallel.</p><div><hr></div><h2>Making Conflicting Writes Non-Conflicting, With Join Operations</h2><p>But what about the case of <code>TableInsert</code>? It&#8217;s easy to understand the analysis of looking at the codepath responsible for building the table, and concluding that: each call to <code>TableInsert</code> mutates <code>arena</code> and <code>table</code>, which conflicts with other calls to <code>TableInsert</code>, and therefore all <code>TableInsert</code>s must be synchronized.</p><p>And again, in some cases this may be completely reasonable. And in those cases, as I&#8217;ve found, it isn&#8217;t necessarily the end of the world. I&#8217;ve found that it&#8217;s often easy to implement a much cleverer synchronization mechanisms than&#8212;for instance&#8212;just throwing a lock around <code>table</code> and <code>arena</code>. For instance, if <code>table</code> is a hash table, instead of taking an <code>Arena</code>, <code>TableInsert</code> can take a different mechanism&#8212;I&#8217;ll call it a <code>Guard</code>&#8212;such that the insertion mechanism can map a <code>(Guard, Hash)</code> pair to an <code>Arena</code> and a read-write lock, where the <code>Guard</code> can have several <code>Arena</code>s and locks, subdividing the hash table. Thus, in order to actually require a writing lock, multiple threads need to not only be writing to the same table, but to the same &#8220;stripe&#8221; in the same table.</p><p>This already provides an improvement over the single lock and arena, but beyond this, it&#8217;s often well within reason to do <em>much better</em> than that, with carefully-designed atomic locking operations rather than the heavier-handed locking abstractions. But a full investigation of such techniques deserves its own post. Clever locking mechanisms, from what I can tell, seem like they&#8217;re useful in getting another 20%, 30%, or 40% out of code which already has synchronization built in, but they&#8217;re still much slower than code which requires <em>no synchronization whatsoever</em>.</p><p>But a simple tweak to the problem allows all calls to <code>TableInsert</code> to be bucketed, and thus non-conflicting. Instead of having a <em>single table</em>, I can also simply have <em>many tables</em>, and then have a separate <em>&#8220;join&#8221; step</em>, which allows me to combine <em>the many tables</em> into a <em>single joined table</em>. In many cases, a multi-threaded &#8220;build&#8221; step, with each thread operating without synchronization (and on separate arenas), followed by a &#8220;join&#8221; step, may be much more efficient than a multi-threaded &#8220;build&#8221; step which requires synchronized access to a single data structure.</p><p>Assume <code>Table</code> is a simple linked-list-chaining hash table:</p><pre><code>struct Node
{
  Node *next;
  Key key;
  Val val;
};

struct Slot
{
  Node *first;
  Node *last;
};

struct Table
{
  U64 slots_count;
  Slot *slots;
};</code></pre><p>A &#8220;join&#8221; operation for two <code>Table</code>s with the same <code>slots_count</code> can then be easily written as a linked-list concatenation for each slot.</p><pre><code>Table *dst = ...;
Table *src = ...;
for(U64 idx = 0; idx &lt; slots_count; idx += 1)
{
  if(dst-&gt;slots[idx].last &amp;&amp; src-&gt;slots[idx].first)
  {
    dst-&gt;slots[idx].last-&gt;next = src-&gt;slots[idx].first;
    dst-&gt;slots[idx].last = src-&gt;slots[idx].last;
  }
  else if(src-&gt;slots[idx].first)
  {
    MemoryCopyStruct(&amp;dst-&gt;slots[idx], &amp;src-&gt;slots[idx]);
  }
}</code></pre><p>If type-system-enforcement of the same <code>slots_count</code> is desired, then the API can be slightly tweaked to the following:</p><pre><code>struct Table
{
  // `slots_count` is omitted - chosen/passed by user
  Slot *slots;
};

void TableInsert(Arena *arena, Table *table, U64 slots_count, Key *key, Val *val);
Val *TableLookup(Table *table, U64 slots_count, Key *key);
void TableJoin(Table *dst, Table *src, U64 slots_count);</code></pre><p>A &#8220;join&#8221; operation can also be extended with a number of other features, like sorting each bucket to ensure deterministic results. It can also be easily parallelized, as each slot&#8217;s &#8220;join&#8221; operation is entirely independent from the joining work for all other slots. As such, the mutations for the &#8220;join&#8221; operation are bucketed.</p><div><hr></div><h2>Closing Thoughts</h2><p>Similar to my feeling after learning simpler <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">memory management</a> techniques, I&#8217;ve learned that multi-threaded programming becomes significantly easier when I adopt a careful, grounded, and organized approach, and when I&#8217;m closely familiar with a given problem&#8217;s details, and meticulously untangle operations by their lower level properties, rather than relying purely on stories told at higher levels of abstraction. In this case, those details include <em>reads</em>, <em>conflicting writes</em>, and <em>non-conflicting writes</em>.</p><p>I hope this post provided similar insight to you.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Codebase Walkthrough: From-Scratch Data Structures]]></title><description><![CDATA[Walking through techniques & helpers for building data structures in the Digital Grove codebase.]]></description><link>https://www.dgtlgrove.com/p/from-scratch-data-structures-in-the</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/from-scratch-data-structures-in-the</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Mon, 01 Jan 2024 00:29:04 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/140233072/bfa920f8-d49c-478d-ba02-5f384a3280b1/transcoded-101659.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this video, I walk through some techniques and helper code within the <a href="https://git.rfleury.com">Digital Grove codebase</a> for building various common data structures, including: arrays, singly-linked stacks, singly-linked queues, doubly-linked lists, chunked linked lists, n-ary trees, hash tables, and simultaneous combinations of them. All memory management for the data structures is arena-based.</p><p>This concretizes many of the ideas I&#8217;ve previously written about, and walks through the codebase&#8217;s basic building blocks for hand-rolling custom data structures which are better suited for one&#8217;s problem than&#8212;for instance&#8212;a generic data structure object in a language standard library.</p>
      <p>
          <a href="https://www.dgtlgrove.com/p/from-scratch-data-structures-in-the">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Codebase Walkthrough: Strings]]></title><description><![CDATA[Outlining and walking through the base layer's string code, which has helped me write much simpler, more flexible, more dynamic, and more robust string processing code in C.]]></description><link>https://www.dgtlgrove.com/p/intro-to-the-codebase-part-v-strings</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/intro-to-the-codebase-part-v-strings</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Sun, 31 Dec 2023 00:28:11 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/140210231/22baf36a-ffba-4a56-9367-4e5ae2b59d73/transcoded-51870.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this video, I walk through the <em>string processing</em> portion of the Digital Grove codebase base layer. This string code is built around the same arena-centric memory management ideas as the rest of the codebase&#8212;it also dispenses with some frustrating legacy string design decisions (null termination and &#8220;object oriented&#8221; string objects), preferring immutable, explicit-length-delimited &#8220;string views&#8221;.</p>
      <p>
          <a href="https://www.dgtlgrove.com/p/intro-to-the-codebase-part-v-strings">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Easiest Way To Handle Errors Is To Not Have Them]]></title><description><![CDATA[On structuring code in an "error-free" way.]]></description><link>https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors</link><guid isPermaLink="false">https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors</guid><dc:creator><![CDATA[Ryan Fleury]]></dc:creator><pubDate>Fri, 29 Dec 2023 06:20:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3cc05ff7-14cb-4b4a-9254-5909470a1c1c_4096x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!My-g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!My-g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!My-g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!My-g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!My-g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!My-g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:290659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dgtlgrove.com/i/137840302?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!My-g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 424w, https://substackcdn.com/image/fetch/$s_!My-g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 848w, https://substackcdn.com/image/fetch/$s_!My-g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!My-g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2989992b-3c6c-487c-a4e9-47d7d55bf63b_4096x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Programmers are taught early to spend a great deal of time considering &#8220;errors&#8221;, and how to &#8220;handle&#8221; them. But one of the most important programming lessons I&#8217;ve learned over the past several years is to dismiss the idea that errors are special. At the bottom, the computer is a computer. It&#8217;s a data transformation machine. An error case is simply a <em>case</em>. Data encoding an error is simply another form of data. Irrespective of how many language features and type variants one decides to layer on top of the computer, nothing changes this fact.</p><p>In the usual educational programming journey today, this underlying reality is ignored if not quickly dismissed, and the programmer is taught to consider errors as higher level, abstract concepts, which are distinct from non-errors. But regardless of the time spent learning to &#8220;handle errors&#8221; in this way, <em>actually handling errors</em> never seems to become completely painless&#8212;in the best (for the programmer&#8217;s comfort&#8212;but perhaps worse for the software) case, the programmer is taught to remain blissfully unaware of the holes in their &#8220;error handling&#8221; strategies, and the consequences of that are, well, all around the modern software world, in overly-rigid or legalistic designs, unexplained failures, invisible rules which dictate successful or unsuccessful software usage, and so on.</p><p>One of the important design problems in any computing system is in robustness to a variety of conditions&#8212;including error conditions. Computers, at the bottom, are precise physical machines. If one hopes for them to interact seamlessly with other computers, or with humans, then they must fluidly adapt to error conditions, and provide pathways to gracefully recover and continue operating, such that a human could learn, adapt, and try again.</p><p>At some levels of abstraction, the concept of an &#8220;error&#8221; does arise&#8212;but as I&#8217;ve <a href="https://www.rfleury.com/p/emergence-and-composition">discussed before</a>, it&#8217;s much more productive (both for the programmer <em>and</em> the computer) to consider the higher level manifestation of an idea as <em>emerging from </em>some underlying computational machinery, rather than architecting computational machinery <em>according to</em> some abstract idea. Things which are concretely identical ought to be organized as such. Systems which respect concrete reality will perform better, and result in fewer headaches, than those which don&#8217;t.</p><p>Once error cases are understood to be no different from normal cases, they may be understood with normal analysis. In past posts, I&#8217;ve written about <em><a href="https://www.rfleury.com/p/a-taxonomy-of-computation-shapes">codepaths</a>&#8212;</em>one possible trace of execution on a computer&#8212;and how code can be architected such that it produces a large <a href="https://www.rfleury.com/p/the-codepath-combinatoric-explosion">combinatoric explosion of codepaths</a>, or&#8212;preferably&#8212;such that it collapses all desired effects into as few codepaths as possible. Fewer codepaths means those codepaths receive more programming time, they&#8217;re exercised more regularly, and tuning those codepaths (for performance, for instance) will have a much larger impact.</p><p>In other words, because <em>every new codepath</em> represents a new possibility of code execution, it also represents a new possibility of code <em>failure</em>, and thus an opportunity for testing and further development work.</p><p><em>Error cases</em> being simply <em>cases</em> is another way of saying they correspond to a subset of all possible codepaths. If error cases can gracefully <em>flow</em> through the same <em><a href="https://www.rfleury.com/i/112467756/effective-codepaths-vs-codepaths">effective codepaths</a></em> through which <em>other</em> cases flow, then <em>handling those errors</em> becomes nearly <em>free</em>. As such, the handling of those errors can come at a remarkably low cost, instead of being an annoying extra constraint which must be &#8220;handled&#8221; by the &#8220;real&#8221; code.</p><p>It&#8217;s <em>all</em> real code, because they&#8217;re <em>all</em> real cases.</p><p>In this post, I&#8217;ve gathered a set of principles and techniques which have helped me use this reality to my advantage, and helped me write much more robust and failure-resistant code, which more gracefully adapts to changing conditions.</p><div><hr></div><h2>Guarantee Valid Reads (Nil Pointers)</h2><p>One way code may fail is reading or writing to a virtual address to which it doesn&#8217;t have access, causing an access violation. This might be by reading or writing to a null pointer. Because of this, code is often written in an extremely paranoid fashion, littered with <code>if</code>s or <code>assert</code>s, in an attempt to catch invalid pointers (and ideally deduce how they arose), hopefully before software runs on a user&#8217;s machine:</p><pre><code>Foo *foo = malloc(sizeof(Foo));
assert(foo != 0);
InitializeFoo(foo);
Bar *bar = foo-&gt;bar;
if(bar)
{
  // use `bar`
}</code></pre><p>In simple cases this is not a major concern, but the complexity compounds for each pointer which, when read or written to, would cause an access violation. Each <code>if</code>, <code>assert</code>, or early <code>return</code> is representative of a multiplying by 2 of the number of possible codepaths. This leads to situations like those I&#8217;ve <a href="https://www.rfleury.com/i/112467756/questions-vs-answers">described in past writing</a>, where the programmer must explicitly add code per codepath. When the set of codepaths grows exponentially with each new possible failure point, this results in slow, long, noisy, and difficult-to-predict code.</p><p>Now, some of these possible codepaths correspond to genuine failure cases. If code attempts to allocate a buffer to which it then writes, but the allocation fails, there is not really much that the rest of the code can validly do. I&#8217;ll speak more on that case later. But first, I&#8217;d like to focus on the case in which a pointer is only being <em>read from</em>.</p><p>The primary reason <em>reading</em> from pointers (prepared by other codepaths) may cause an access violation is because code preparing the pointers often makes the decision to <strong>(a)</strong> return a pointer to some valid and accessible spot, or <strong>(b)</strong> return a null pointer. In the case of <strong>(a)</strong>, the pointer can be read from without an access violation. In the case of <strong>(b)</strong>, it cannot.</p><p>But a simple solution is to prefer option <strong>(c) </strong>to <strong>(b)</strong>&#8212;return a pointer to a &#8220;<em>nil struct</em>&#8221;, rather than a null pointer, if <strong>(a)</strong> fails. This <em>nil struct</em> is allocated up-front and may be stored in read-only memory. Any pointers contained in the nil struct point to other nil structs of the associated type. If there are pointers to the same type, they point at the same nil struct in which they&#8217;re contained (in other words, the pointers are self-referential).</p><pre><code>struct Node
{
 Node *first;
 Node *last;
 Node *next;
 Node *prev;
 Node *parent;
 Payload v;
};

read_only Node nil_node = {&amp;nil_node, &amp;nil_node, &amp;nil_node, &amp;nil_node, &amp;nil_node};</code></pre><p><em><strong>Note: </strong></em><code>read_only</code><em>, a macro I define in the Digital</em> <em>Grove codebase, can be easily implemented as expanding to compiler-specific allocation attributes&#8212;e.g. on MSVC, </em><code>__declspec(allocate(".roglob"))</code><em>, where </em><code>.roglob</code> <em>is a section earlier defined with </em><code>#pragma section(".roglob", read)</code></p><p>If this pattern is adopted, then irrespective of <em>failure</em> or <em>success </em>of the codepath preparing a pointer for later use, it is <em>guaranteed</em> in all cases&#8212;given completion of the codepath&#8212;that the resultant pointer is readable.</p><p>This <em>collapses</em> all cases in which later code required <em>two</em> codepaths&#8212;one for a valid pointer, one for a null pointer&#8212;down into only requiring a <em>single</em> codepath. This simplifies the <em>multiplicative</em> effect that each pointer brought, down to a multiplication of <em>one</em>.</p><p>In other words, this (an example from a <a href="https://www.rfleury.com/p/the-codepath-combinatoric-explosion#%C2%A7questions-vs-answers">previous post</a>):</p><pre><code>Node *SearchTreeForInterestingChain(Node *root)
{
  // assuming `ChildFromValue` derefs its parameter.
  // obviously these `if`s can be hidden inside the
  // API too, but that doesn't change the fact that
  // they happen, nor remove the burden from *someone*
  // to check...

  Node *result = 0;
  if(root)
  {
    Node *n1 = ChildFromValue(root, 1);
    if(n1)
    {
      Node *n2 = ChildFromValue(n1, 2);
      if(n2)
      {
        Node *n3 = ChildFromValue(n2, 3);
        if(n3)
        {
          result = ChildFromValue(n3, 4);
        }
      }
    }
  }
  return result;
}</code></pre><p>&#8230;turns into&#8230;</p><pre><code>Node *SearchTreeForInterestingChain(Node *root)
{
  Node *n1 = ChildFromValue(root, 1);
  Node *n2 = ChildFromValue(n1, 2);
  Node *n3 = ChildFromValue(n2, 3);
  Node *n4 = ChildFromValue(n3, 4);

  // we will necessarily get here - and we're also
  // guaranteeing for all of *our* callers that they
  // can dereference this result, even if it's 'invalid'
  return n4;
}</code></pre><p>In my experience this dramatically simplifies a lot of code, which never needed the &#8220;validity&#8221; check, but could gracefully work with the &#8220;empty&#8221; values of a nil struct.</p><p>This technique is further helped if the nil struct contains <em>useful default values</em>, and ensuring this is especially simple (and performant) when <em>zero</em> is a useful default value in all cases.</p><div><hr></div><h2>Make Zero Values Valid (Zero-Is-Initialization)</h2><p>Many modern programming styles are obsessed with <em>default values</em> and <em>initialization</em>, as a property of type information. C++ added &#8220;constructors&#8221; to what otherwise would be plain-old-data structs, which are arbitrary functions that implicit run when the language can statically recognize that a type is being instantiated (e.g. via stack allocation or the <code>new</code> operator). The entire concept of RAII (Resource Acquisition Is Initialization)&#8212;one of the goals of C++&#8217;s constructors&#8212;is that the presence of some type instantiation <em>offers the guarantee</em> that the instance is <em>initialized</em>.</p><p>But the best&#8212;simplest, fastest, and most maintainable&#8212;code is that which never needed to be written, or to be executed (as its intended effects are already gracefully accomplished through other means). In the case of initialization, this is trivially attainable if all initialization can be simplified to <em>zero initialization</em>.</p><p>If zero initialization is sufficient for memory initialization, several benefits follow. First, freshly committed memory returned by modern operating system allocation APIs (e.g. via <code>VirtualAlloc</code>) is always already zeroed entirely, and so in that case, no additional work must happen. Second, the code for zero-initializing memory (if unable to rely on default zero-allocation of freshly allocated pages&#8212;e.g. if reallocating an already-previously-allocated chunk) is trivial, being a single <code>memset</code>. It&#8217;s completely generic and independent of what the underlying type is, which members it has, its size, and so on. Ultimately, zero initialization is trivial to write, trivial to execute, and trivial to maintain.</p><p>Within the context of error condition robustness, these benefits go further. When this pattern is adopted codebase-wide such that most data is zero-initialized, <em>and also</em> that most codepaths accept zero values, any data which fails to be constructed (due to what might be considered an &#8220;error&#8221;) will remain zero-initialized, and thus will gracefully work with the codepaths that later consume it. This applies to <em>nil struct</em> pointers&#8212;which, with the exception of pointers to other nil structs, are completely zero-initialized&#8212;but also other cases, like a struct allocated on an <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">arena</a> or the stack, returned from some API.</p><p>Zero initialization is subtle because it&#8212;despite its title&#8212;describes only a small part of &#8220;initialization&#8221; code&#8212;after all, almost all initialization code disappears. It instead mainly describes a <em>general property</em> of all <em>non-initialization </em>code&#8212;the fact that any codepath which reads from some input, could have that input made zero, and it&#8217;d behave as expected, with as-sensible-behavior-as-possible (for whatever the context is).</p><p>A trivial example of this is the <code>String8</code> type, and related string processing code, in the <a href="https://git.rfleury.com/community/root_basic">Digital Grove codebase</a>:</p><pre><code>struct String8
{
  U8 *str;
  U64 size;
};</code></pre><p>This is the &#8220;string view&#8221;, or &#8220;slice&#8221; type, used for all codepaths which read from strings. If an instance of <code>String8</code> is zero-initialized, all code into which that instance is fed will interpret the instance as encoding an empty string&#8212;the code will gracefully allow <code>str</code> to be <code>0</code>, as all operations are delimited by <code>size</code> (also, in this context, obviously zero).</p><p>Another, slightly more complex example is found in the <a href="https://git.rfleury.com/community/root_basic/src/branch/master/code/mdesk/mdesk.h">Metadesk text format parser in the Digital Grove codebase</a>. This layer contains a lexing API (which takes care of all tokenization work before the primary tree parsing pass):</p><pre><code>struct MD_TokenArray
{
 MD_Token *v;
 U64 count;
};

struct MD_MsgList
{
 MD_Msg *first;
 MD_Msg *last;
 U64 count;
 MD_MsgKind worst_message_kind;
};

struct MD_TokenizeResult
{
 MD_TokenArray tokens;
 MD_MsgList msgs;
};

MD_TokenizeResult MD_TokenizeFromText(Arena *arena, String8 text);</code></pre><p>The types which are returned from <code>MD_TokenizeFromText</code>&#8212;<code>MD_TokenArray</code> and <code>MD_MsgList</code>&#8212;are both initialized to zero. Both are processed by code which looks like the following:</p><pre><code>MD_TokenizeResult tokenize = MD_TokenizeFromText(arena, text);
for(MD_Msg *msg = tokenize.msgs.first; msg != 0; msg = msg-&gt;next)
{
  // print out `msg` info
}
for(U64 token_idx = 0; token_idx &lt; tokenize.tokens.count; token_idx += 1)
{
  // use `tokenize.tokens.v[token_idx]`
}</code></pre><p>Similar to <code>String8</code>, this code gracefully works with zero initialization, as both <code>for</code>-loops above are delimited by either a null <code>first</code> pointer in the <code>MD_MsgList</code>, or by a zero <code>count</code> in the <code>MD_TokenArray</code>. So a zero initialized <code>String8</code>, <code>MD_TokenArray</code>, <code>MD_MsgList</code> instances are all naturally consumed as &#8220;empty&#8221;.</p><p>One of the major exceptions to zero initialization as a rule is that it&#8217;s sometimes worthwhile to compromise it for the purpose of providing <em>nil struct pointers</em>. But whether or not this is worthwhile depends on the case. In the above case&#8212;the <code>MD_MsgList</code> pointers&#8212;I don&#8217;t use nil struct pointers, because the list is processed as a batch, and as such is naturally terminated by null pointers. If I wanted the ability for code to&#8212;for example&#8212;gracefully read the first node and read the first value&#8212;or a nil struct if no such value existed&#8212;then nil struct pointers might be worthwhile. But for simple list types like this, my feeling is that the tradeoff weighs more strongly in the &#8220;don&#8217;t add more global nil structs, keep loops terminating at <code>0</code>&#8221; direction, since it&#8217;d be unlikely that the list isn&#8217;t processed entirely at once.</p><p>One way in which both zero initialization <em>and</em> nil struct pointers can work together without conflict is when <em>pointers</em> are not <em>stored</em> in any stateful, mutable data structures, but only constructed on-the-fly (in an immediate-mode fashion) from&#8212;for instance&#8212;indices or handles stored in mutable data structures. In that case, zero indices or handles can be treated as mapping to a nil struct pointer, such that when user code does eventually resolve those indices or handles to pointers, it&#8217;ll resolve zeroes to nil struct pointers.</p><div><hr></div><h2>If You&#8217;re Going To Fail, Fail Early</h2><p>I&#8217;ll now return to the &#8220;allocate buffer, write into the buffer&#8221; case. I already covered what might happen when a pointer is being <em>read from</em>&#8212;an API implementation can <em>guarantee</em> for its users that pointer <em>reads</em> (with some caveats&#8212;e.g. delimited by a size) are always valid. But nil structs are allocated in pages marked as read-only, and for good reason&#8212;if they <em>weren&#8217;t</em>, code which received a nil struct pointer and <em>wrote to it</em> could compromise <em>other</em> code which expected reads from nil structs to contain useful and empty defaults.</p><p>So, receiving a nil struct pointer and <em>writing to it</em> is a bug. What approach should one take, then, when using pointers for the purpose of <em>writing</em>?</p><p>To dig into this problem, consider the following code:</p><pre><code>U64 *buffer = ...;
buffer[0] = 123; // ???</code></pre><p>On the second line, is accessing <code>buffer[0]</code> &#8220;unsafe&#8221;?</p><p>The answer, of course, depends on what the &#8220;<code>...</code>&#8221; is replaced with. Contrary to popular belief, using a normal pointer isn&#8217;t a game of Russian Roulette for a program&#8212;it all depends on <em>what guarantees</em> have been made at this point in the code. Also contrary to popular belief, special language features do not magically make this problem go away, because depending on said <em>guarantees</em>, there may be a genuinely possible failure point, or there may not be.</p><p>But consider that &#8220;<code>...</code>&#8221; is replaced by <code>malloc(64*1024*1024*1024)</code>:</p><pre><code><code>U64 *buffer = malloc(64*1024*1024*1024);
buffer[0] = 123; // ???</code></code></pre><p>There is no way around the fact that this represents a genuinely possible failure case&#8212;the software was produced with the expectation that the user&#8217;s machine would allow a 64 gigabyte buffer to be <code>malloc</code>&#8217;d. The software is running on a real machine, with real, fixed resources, and is using an implementation of <code>malloc</code> which was built with various constraints. Such an allocation may not succeed, depending on many factors.</p><p>Let&#8217;s also suppose that this 64 gigabyte buffer is completely necessary for the problem. This means that the programmer has no choice with respect to <em>whether or not he allocates the buffer</em>&#8212;he knows he needs it. But he <em>does</em> have a choice with <em>when </em>the buffer is allocated, and what to do if the allocation fails.</p><p>Furthermore, the user <em>cares</em> when the allocation happens, if the allocation fails. The user also cares about not losing progress, or state. For the user, if the allocation is going to fail, the earlier the better. This is for a fundamental design reason&#8212;human computer interaction is cyclic. The user supplies input to a compute system, the compute system produces output, which is then interpreted by a user, which the user anticipates before feeding more input into the compute system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RInr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RInr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 424w, https://substackcdn.com/image/fetch/$s_!RInr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 848w, https://substackcdn.com/image/fetch/$s_!RInr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 1272w, https://substackcdn.com/image/fetch/$s_!RInr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RInr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png" width="541" height="417.0512129380054" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3639865-f59d-4aee-a85d-18e520af881e_742x572.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:572,&quot;width&quot;:742,&quot;resizeWidth&quot;:541,&quot;bytes&quot;:47852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RInr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 424w, https://substackcdn.com/image/fetch/$s_!RInr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 848w, https://substackcdn.com/image/fetch/$s_!RInr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 1272w, https://substackcdn.com/image/fetch/$s_!RInr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3639865-f59d-4aee-a85d-18e520af881e_742x572.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If the relevant <em>output</em> of some system is that &#8220;the operation cannot possibly succeed&#8221;, forcing the user to jump through several hoops, prepare state, and <em>then</em> telling them the operation cannot succeed is wasteful of the user&#8217;s time&#8212;and a hell of a lot more frustrating&#8212;than simply telling them up-front, immediately after the user triggers the operation.</p><p>Communicating such information to the user as early as possible is paramount&#8212;the software must not arbitrarily prolong the length of the &#8220;compute system cycle&#8221;.</p><p>A good rule-of-thumb I&#8217;ve found is that these <em>potential failure points</em> should occur in <em>as shallow frames in the program&#8217;s call stack as possible</em>, and I ought to exert pressure to <em>eliminate many of these potential failure points&#8212;</em>having them around is not <em>free</em>.</p><p>Take the following call stack:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4zYP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4zYP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 424w, https://substackcdn.com/image/fetch/$s_!4zYP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 848w, https://substackcdn.com/image/fetch/$s_!4zYP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 1272w, https://substackcdn.com/image/fetch/$s_!4zYP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4zYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png" width="628" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4daff58-a826-431e-b083-1ae2159eee13_628x379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:628,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29087,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4zYP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 424w, https://substackcdn.com/image/fetch/$s_!4zYP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 848w, https://substackcdn.com/image/fetch/$s_!4zYP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 1272w, https://substackcdn.com/image/fetch/$s_!4zYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4daff58-a826-431e-b083-1ae2159eee13_628x379.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If I force these <em>failure points</em> to occur, say, within <code>U_EntryPoint</code>, rather than <code>UI_SignalFromBox</code>, I force <em>more pre-allocation</em>. Before the software gets into the weeds of its operations, it guarantees it can successfully perform those operations, and that it has the required resources. This way, I <em>guarantee more sooner</em>, and <em>learn more about failure points earlier</em>. This allows me to inform the user much earlier, which keeps the compute system cycle short.</p><p><a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">Arena allocators</a> are a beautiful example of abiding by this rule-of-thumb without sacrificing the potentiality and upsides of dynamism. Arenas have the ability for dynamic growth&#8212;and each dynamic growth represents a possible failure point&#8212;but the shape of their allocation interface is catered to lightning fast allocations off a pre-allocated block of memory. So during development, or in shipping builds where dynamism is preferred over static requirements (or required), arenas can implement <em>common case allocations</em> with lightning fast allocations off a pre-allocated block of memory, <em>but fall back </em>to dynamic growth paths. But when tightening up code such that it&#8217;s nearly bulletproof, the same arena can be adjusted to work completely off of a fixed up-front reservation size in all expected cases&#8212;and the decision about the static requirements of this fixed up-front reservation can be made later, when there&#8217;s enough information to responsibly make it.</p><p>In any case, after the guarantee has been made that this sort of buffer allocation has succeeded, accessing the buffer (assuming the access is in-bounds) is <em>guaranteed</em> to be valid. All code subsequent the guarantee may take advantage of this fact&#8212;ultimately, this means that there are no pointers to check, no <code>assert</code>s to write, no paranoia&#8212;the pointer is valid, guaranteed. All codepaths which access the buffer need not be bifurcated for the possibility of failure&#8212;they&#8217;re already by definition within the success path, so no further <a href="https://www.rfleury.com/p/the-codepath-combinatoric-explosion">combinatoric explosion of codepaths</a> occurs.</p><div><hr></div><h2>Prefer Fewer Types&#8212;Or, Prefer AND over OR</h2><p>Another facet of modern programming techniques and tooling is the addiction to the generation of new types. Many programmers have come under the spell of deeply desiring the <em>structure of the program itself</em> to be encoded, somehow, within a language&#8217;s type system, with some vague expectation that the code&#8212;after all the types are established&#8212;will practically write itself. This addiction is also facilitated by a programmer&#8217;s fixation on perfectly compressing all instances of all types&#8212;all data must be encoded in types in which <em>all parts</em> of that type are likely used, and wasted space is not acceptable.</p><p>I think this perspective misses a crucial detail of the relationship between <em>types</em> and <em>code</em>. The fact of the matter is, the larger the number of types, the larger the number of required codepaths. It&#8217;s not obvious how exactly these numbers are related, but the general correlation is clear&#8212;types define the <em>texture</em> which two codepaths must agree on in order to &#8220;click together&#8221;. Introducing more types <em>necessarily requires</em> more codepaths in order to &#8220;click into&#8221; those textures.</p><p>One way in which the addiction to generating types surfaces in the context of &#8220;error handling&#8221; is the presumption that the resultant type of some codepath is <em>either</em> a successful value produced by the codepath, <strong>or</strong> it&#8217;s an error. Ultimately, the instinct in this style of programming is to build a <em>sum type</em> which may encode <em>valid results</em> or <em>invalid results</em>. And as I&#8217;ve <a href="https://www.rfleury.com/i/112467756/questions-vs-answers">previously covered</a>, a source of combinatoric codepath generation is in the overuse of sum types&#8212;in short, because they introduce <em>questions</em> about an instance of a type&#8217;s data format, rather than <em>answers</em>.</p><p>A helpful lesson for me was in reframing error information returned by a codepath as error information <em>in addition to</em> whatever the &#8220;non-error result&#8221; is. This small change eliminates needless bifurcation of the code receiving the result&#8212;it can simply be one codepath which processes both valid results (or gracefully no-ops, if the valid results are zero-initialized), <em>and</em> any error information.</p><p>The <a href="https://git.rfleury.com/community/root_basic/src/branch/master/code/mdesk/mdesk.h">Metadesk lexer API from the Digital Grove codebase</a> is a simple example:</p><pre><code><code>MD_TokenizeResult tokenize = MD_TokenizeFromText(arena, text);
for(MD_Msg *msg = tokenize.msgs.first; msg != 0; msg = msg-&gt;next)
{
  // print out `msg` info
}
for(U64 token_idx = 0; token_idx &lt; tokenize.tokens.count; token_idx += 1)
{
  // use `tokenize.tokens.v[token_idx]`
}</code></code></pre><p>The parser API follows the same pattern:</p><pre><code>struct MD_ParseResult
{
 MD_Node *root;
 MD_MsgList msgs;
};

MD_ParseResult MD_ParseFromTextTokens(Arena *arena, String8 filename, String8 text, MD_TokenArray tokens);</code></pre><p>This makes especially good sense because it allows the returned <code>MD_MsgList</code> to <em>refer to</em> <code>MD_Node</code>s and <code>MD_Token</code>s&#8212;both types, together, are more powerful than either in isolation.</p><p>In my experience, this API style allows usage code to, more-or-less, ignore the distinct idea of &#8220;errors&#8221; altogether, until it actually cares to use error information (e.g. to display error information to the user). Most usage code can simply be a single effective codepath&#8212;or in other words, it can have a completely flat control flow structure, which simply performs the required data transformation steps, in order.</p><p>Writing this, I can&#8217;t help but feel that it&#8217;s a fairly trivial lesson&#8212;but for those feeling similarly, it&#8217;s good to remind oneself just how overcomplicated programmers like to make their problems, and just how much they like to present the fa&#231;ade of productivity by playing around with type systems. I don&#8217;t care to digress into modern language feature design around errors&#8212;it&#8217;s trivial to find online&#8212;but in my view, the fact that <em>there is</em> such a widescale (and often passionate) conversation about the &#8220;need&#8221; for &#8220;error handling language features&#8221; is indicative enough of the embarrassing state of software development.</p><div><hr></div><h2>Error Information Side-Channels</h2><p><code>errno</code> is a globally-available, thread-local integer in the C standard runtime library, which is mutated by APIs implicitly when an error condition occurs.</p><p>Aside from a number of silly <em>implementation details</em> of <code>errno</code>, there are reasonable aspects of its design. Notably, it follows other principles outlined in this post&#8212;it doesn&#8217;t needlessly harp on the validity of results, resulting in needlessly-bifurcated usage code codepaths. It follows the principle that error information ought to be available <em>in addition to</em>, rather than <em>instead of</em>, &#8220;valid data&#8221;. Instead, it&#8217;s used to implicitly collect error information, and it allows usage code to perform whatever operations it needed, and only check when needed.</p><p>One serious problem with <code>errno</code> is that it&#8217;s a <em>single</em> thread-local slot for error information, so it&#8217;s only enough to inform usage code of <em>the most recent error</em>. Another is that there&#8217;s no way of knowing <em>which code </em>set <code>errno</code> last, and <em>when</em>.</p><p>These are fairly serious limitations. The lack of capacity and the lack of error source location information prohibits <code>errno</code> from being meaningfully used in <em>more shallow frames in a call stack</em>, which would otherwise be an extremely powerful usage pattern.</p><p>Consider, for example, if I wanted to&#8212;at the end of each simulation loop, in a game project&#8212;process <em>all</em> implicitly-gathered error information, and maybe render them in a debug user interface (maybe as a timeline, or flame graph). Every point in this timeline would represent a place at which error information was gathered&#8212;it could include call stack with function names, source code location, and so on. It could be integrated with a debugger such that breakpoints could be easily set when one particular error (in a particular call stack) occurs.</p><p>This is all easily doable with more powerful &#8220;side-channels&#8221; of error information, which don&#8217;t have the critical shortcomings of <code>errno</code>.</p><p>And contrary to <em>yet another</em> popular belief, programming is not delimited by decisions made while designing the C runtime library. All languages&#8212;including C&#8212;can take on entirely new forms when someone is willing to go to work and design an alternative &#8220;standard&#8221; library and environment (which is part of my goal through the <a href="https://git.rfleury.com/">Digital Grove codebase</a>). Memory management, string processing, error handling, run-time type information, compile-time execution, metaprogramming, and <em>a great many other things </em>can become dramatically simpler than they&#8217;re often portrayed by those who spend more time arguing about languages on HackerNews or Reddit than they do programming.</p><p>Instead of each thread being equipped with a single integer, it can be equipped with an entire arena and message log:</p><pre><code>per_thread Arena *log_arena;
per_thread MsgList log_msgs;</code></pre><p>Each message stored in the <code>MsgList</code> can be equipped <em>not only</em> with a single integer, but with a callstack, a string, a color, and whatever other information is deemed useful.</p><p>This kind of design avoids the extreme lossiness of e.g. <code>errno</code>, but keeps code humming along smoothly. Within this context, &#8220;error handling&#8221; is simply the ability for code to gracefully continue operating in the presence of errors, with the error information accumulated in a log to be later inspected.</p><p>Logs of this kind allow not only for powerful introspection of a trace of codepath execution, but using them as a primary mechanism for error data gathering keeps each codepath honest about operating soundly and gracefully in the presence of errors. Like many of the lessons in this post, this keeps the number of codepaths small, and thus the value of programmer work high.</p><div><hr></div><h2>Closing Thoughts</h2><p>The tendency for programmers to build towers of abstractions over bad fundamentals has several fundamental issues. Better designs&#8212;which do not compromise on the fundamentals&#8212;are obfuscated. Programmers detach from fundamental reality. They teach others to detach from fundamental reality, such that future improvements are much more difficult to design and conceptualize. Programmers settle for subpar compromises on the performance and simplicity of an abstraction (which, in a pragmatic context, may be completely reasonable&#8212;but poor abstractions substituted for good ones, used for pragmatic reasons, are obviously not the endpoint of abstraction design in any area). <a href="https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator">Arena</a>s are an example of one such better design&#8212;they do not compromise on the low level details, <em>and</em> they simplify higher level work. They&#8217;re a clearly superior alternative to gigantic complex memory management systems, and one that could only be identified when considering the problem near the fundamentals.</p><p>&#8220;Error handling&#8221; is much the same way. By dispensing with abstract notions of &#8220;errors&#8221;, and treating the computer as a computer, and data as data, and code as code, the fundamental reality of errors can be grappled with, and grappling with that reality <em>bottom-up</em> can lead to much simpler problems than when theorizing about the same problem <em>top-down</em>. Errors, by-and-large, disappear.</p><p>I&#8217;ve found that, with this mindset, my software became much more robust, my code became much simpler, and I rely on much simpler tools. I didn&#8217;t need another language feature; I didn&#8217;t need another library; I just needed to take control of my problems.</p><p>I hope this post helped communicate to you these same lessons, which I&#8217;ve benefitted so much from.</p><div><hr></div><p>If you enjoyed this post, please consider subscribing. Thanks for reading.</p><p>-Ryan</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.dgtlgrove.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.dgtlgrove.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>