System Performance Architect · Google
I make computing faster — and more efficient.
From open source to hyperscale silicon — and the teams that ship them.
System Performance Architect · Google
From open source to hyperscale silicon — and the teams that ship them.
01 — About
Performance, memory, and scalable computing at hyperscale.
Jonathan Beard turns architecture, memory, interconnect, power, and real workload behavior into data-driven platform decisions — and builds the teams that make those calls stick.
A hardware-architecture and systems-performance leader with 18+ years spanning the U.S. Army, academia, and hyperscale silicon. System Performance Architect at Google, U.S. Army veteran, and the primary author of RaftLib.
Now — Google · 2022–present
RaftLib · Open source · 2013–present
A C++ stream-processing DSL and parallel runtime I authored and still maintain — wire compute kernels into a graph and RaftLib handles the queues, scheduling, and parallelism. It grew out of my PhD on online modeling and auto-tuning of parallel stream systems.
Recognition: Wikipedia: CSP — named with Erlang & Go Wikipedia: RaftLib C++ Reactive Programming · Packt 2018 Awesome C++ Awesome Parallel Computing
Also in the wild: Raft language ipc — SHM library LeigNet sim framework constexpr HighwayHash
Research & PhD thesis
Online Modeling and Tuning of Parallel Stream Processing Systems (2015) — queueing theory, machine learning, and control theory to auto-tune dataflow systems.
Work spans IJHPCA, IPDPS, ICPP, PACT, ICS, Euro-Par, MEMSYS, and HPEC.
At Arm: mentored and funded academic teams at Barcelona Supercomputing Center, UT Austin, and Georgia Tech — 12+ students, 5 professors — converting research into patents, papers, and architecture proposals.
Arm · 2015–2022
Senior Research Engineer → Principal System Architect. CHI memory-copy enhancements, the LS64 architecture + gather-hint instruction (early AArch64 8.7/9.2 accelerator extensions), Project-38 (DoD/DoE), and Arm rep to Sandia's DOE data-movement project.
Concurrently: technical advisor to FastData.io — GPU-accelerated streaming data processing (2016–2020).
IP Contributions
29 granted U.S. patents with more in flight. Where they cluster:
Off the clock
Austin, Texas. Nine years a soldier before the doctorate — a bioinformatics master's (Johns Hopkins) earned along the way, biology + international studies before that.
Builds restomod robots — currently resurrecting an Omnibot 2000, the greatest vintage bot of all time, with a SLAM stack that's headed for open source. Rebuilds cars too: a 1979 Porsche 911SC, stripped to the shell and going back together from the ground up around a custom ADAS of his own design. Photography, too. Blog posts on all of the above are coming.
Washington University
PhD research, Stream-Based Supercomputing Lab. The thesis: teach parallel stream systems to tune themselves while running — online queueing models spot where backpressure will bite, then the runtime resizes buffers and re-places kernels mid-flight. No profiling runs, no restarts. It shipped as RaftLib's autotuner.
Community & speaking
Tools
What I work on
Foundations
02 — Impact · patents
Forward citations from Google Patents: 120+ later filings cite the 29 granted patents — at Apple, Intel, NVIDIA, Microsoft, Samsung, IBM, AMD, Google, and a wave of AI-silicon startups. Each chip links to the cited patent behind the cluster. Highlights here — the full impact map lives with the publications.
Accelerator integration & offload
Memory fabrics & disaggregation
Coherence at scale
Hardware queues & message passing
Context switching & migration
Virtual memory & translation
Near-memory & sparse data movement
Memory reliability & prediction
Method: Google Patents “Cited by” and family-citation data, mined June 2026. A citation marks later work that builds on or relates to the patent — prior art acknowledged by the applicant or examiner, not endorsement.
02 — Impact · papers
RaftLib alone: 42 citations. The body of work is cited across OSDI, EuroSys, SC, USENIX ATC, HPCA, MICRO, CGO, and HPDC. See the full record on Google Scholar.
Stream processing & RaftLib
Near-memory & sparse acceleration
Hardware queues & messaging
Performance modeling of streaming
Method: Semantic Scholar forward citations across 26 publications, mined June 2026; curated to recognizable venues, self-citations excluded. Canonical counts live on Google Scholar.
03 — Writing
Stream processing, memory systems, parallelism — and the occasional war story.
Latest — June 6, 2026
Managing Parallel, Part 4: The Machine Underneath — memory ordering (x86-TSO vs Arm), NUMA topology, and cache-line atomicity
June 6, 2026
One pool of storage that Linux, macOS, and Windows can all share — NFS for the compute nodes, SMB for the laptops, and the UID-alignment trick that stops NFS permissions from ruining your week. Built for home ML clusters and home labs.
June 6, 2026
Managing Parallel, Part 3: When Parallel Goes Wrong — the short list of broken contracts behind most concurrency bugs