<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hpc on Robert Carson</title><link>https://robertcarson.org/tags/hpc/</link><description>Recent content in Hpc on Robert Carson</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Robert Carson</copyright><lastBuildDate>Tue, 15 Apr 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://robertcarson.org/tags/hpc/index.xml" rel="self" type="application/rss+xml"/><item><title>Building a Rust Scientific Computing Stack</title><link>https://robertcarson.org/projects/rust_projects/</link><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><guid>https://robertcarson.org/projects/rust_projects/</guid><description>&lt;h2 class="relative group"&gt;Why Rust
 &lt;div id="why-rust" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-rust" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Before any specific library, some context on the language. The motivation was not ideological. It was practical and rooted in a specific kind of pain: Fortran code that compiles and runs without complaint, but quietly passes array shapes that do not match what the calling code believes they are. No runtime check, no compiler error, just wrong numbers. That class of silent bug had bitten me enough times that when a new language showed up that made memory safety a first-class guarantee enforced at compile time, it was worth learning seriously. Rust became the language I reached for when exploring new ideas or building tools where correctness under the surface mattered as much as correctness of output. Over time it also became a practical choice for building fast Python libraries through PyO3 bindings, since you get native execution speed with an interface that scientific users can drive from notebooks and scripts without knowing anything about Rust.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://robertcarson.org/projects/rust_projects/feature.jpg"/></item><item><title>ExaCMech: GPU-Native Crystal Plasticity Constitutive Library</title><link>https://robertcarson.org/projects/exacmech/</link><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><guid>https://robertcarson.org/projects/exacmech/</guid><description>&lt;h2 class="relative group"&gt;The Problem ExaCMech Solves
 &lt;div id="the-problem-exacmech-solves" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-problem-exacmech-solves" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Crystal plasticity finite element codes spend a large fraction of their time doing one thing: the constitutive update. Given a material&amp;rsquo;s current state (its crystal orientation, internal hardening variables, elastic strain) and a prescribed deformation over a time step, compute the resulting stress and update the material state. This has to be done at every quadrature point in the mesh, which in a production micromechanics simulation means evaluating physically complex, iterative nonlinear equations simultaneously at tens of millions of points. For the problem to be tractable at scale, those evaluations need to run on the GPU, and the models need to be structured in a way that maps naturally to how GPUs actually execute work.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://robertcarson.org/projects/exacmech/feature.png"/></item><item><title>ExaConstit: High-Performance Micromechanics Finite Element Code</title><link>https://robertcarson.org/projects/exaconstit/</link><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><guid>https://robertcarson.org/projects/exaconstit/</guid><description>&lt;h2 class="relative group"&gt;Where ExaConstit Came From
 &lt;div id="where-exaconstit-came-from" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#where-exaconstit-came-from" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When I joined the ExaAM project at LLNL, the project needed a crystal plasticity finite element code that could run on GPUs at scale. ExaAM is a DOE Exascale Computing Project effort to model metal additive manufacturing from the melt pool all the way up to the part scale, and the part that connects microstructure to part-scale mechanical response requires simulating thousands of individual grains with their own crystal orientations, phases, and slip systems. At the time, no open-source code existed that could do this on GPUs in a serious way. Most comparable codes either had no GPU support or treated it as an experimental add-on that barely worked. So we built ExaConstit from scratch with GPU execution as a first-class target from day one.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://robertcarson.org/projects/exaconstit/feature.png"/></item><item><title>High Performance Computing &amp; GPU-Accelerated Scientific Software</title><link>https://robertcarson.org/papers/hpc-gpu-computing/</link><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><guid>https://robertcarson.org/papers/hpc-gpu-computing/</guid><description>&lt;h2 class="relative group"&gt;Overview
 &lt;div id="overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;For roughly forty years, scientific codes got faster by waiting for the next hardware generation. That era ended. Peak floating-point throughput stopped scaling the way it used to, and the compute density now available in leadership-class machines comes almost entirely from GPU accelerators — thousands of streaming multiprocessors per node, operating under a fundamentally different programming model than the CPU clusters that most production scientific software was written for.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://robertcarson.org/papers/hpc-gpu-computing/featured.svg"/></item><item><title>SNLS: A Small Nonlinear Solver Library That Punches Above Its Weight</title><link>https://robertcarson.org/projects/snls/</link><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><guid>https://robertcarson.org/projects/snls/</guid><description>&lt;h2 class="relative group"&gt;Where It Started
 &lt;div id="where-it-started" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#where-it-started" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SNLS predates both ExaCMech and ExaConstit. It was developed at LLNL specifically to solve the problem of running material model constitutive updates on the GPU, at a time when no existing nonlinear solver library was designed to do that. General-purpose solvers like MINPACK or PETSc are built for problems that live on the host. They carry significant overhead per call and are not structured for batched execution across millions of independent points simultaneously. In crystal plasticity specifically, every quadrature point in the mesh needs its own nonlinear solve to update stress and internal state at each time step, and those systems are dense, stiff, and need to run as fast as possible without any per-point failure being an option. That was the gap SNLS was built to fill.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://robertcarson.org/projects/snls/feature.png"/></item></channel></rss>