System Performance Architect · Google
I make computing faster, and more efficient.
From open source to hyperscale silicon, and the teams that ship them.
System Performance Architect · Google
From open source to hyperscale silicon, and the teams that ship them.
01 · About
Performance, memory, and scalable computing at hyperscale.
Jonathan Beard turns architecture, memory, interconnect, power, and real workload behavior into data-driven platform decisions, and builds the teams that make those calls stick.
A hardware-architecture and systems-performance leader with 18+ years spanning the U.S. Army, academia, and hyperscale silicon. System Performance Architect at Google, U.S. Army veteran, and the primary author of RaftLib.
Now: Google · 2022–present
RaftLib · Open source · 2013–present
A C++ stream-processing DSL and parallel runtime I authored and still maintain. Wire compute kernels into a graph and RaftLib handles the queues, scheduling, and parallelism. It grew out of my PhD on online modeling and auto-tuning of parallel stream systems.
Recognition: Wikipedia: CSP, named with Erlang & Go Wikipedia: RaftLib C++ Reactive Programming · Packt 2018 Awesome C++ Awesome Parallel Computing
Also in the wild: Raft language ipc: SHM library LeigNet sim framework constexpr HighwayHash
Research & PhD thesis
Online Modeling and Tuning of Parallel Stream Processing Systems (2015): queueing theory, machine learning, and control theory to auto-tune dataflow systems.
Work spans IJHPCA, IPDPS, ICPP, PACT, ICS, Euro-Par, MEMSYS, and HPEC.
At Arm: mentored and funded academic teams at Barcelona Supercomputing Center, UT Austin, and Georgia Tech (12+ students, 5 professors), converting research into patents, papers, and architecture proposals.
Arm · 2015–2022
Senior Research Engineer → Principal System Architect. CHI memory-copy enhancements, the LS64 architecture + gather-hint instruction (early AArch64 8.7/9.2 accelerator extensions), Project-38 (DoD/DoE), and Arm rep to Sandia's DOE data-movement project.
Concurrently: technical advisor to FastData.io, GPU-accelerated streaming data processing (2016–2020).
IP Contributions
29 granted U.S. patents with more in flight. Where they cluster:
Off the clock
Austin, Texas. Nine years a soldier before the doctorate. A bioinformatics master's (Johns Hopkins) earned along the way, biology + international studies before that.
Builds restomod robots: currently resurrecting an Omnibot 2000, the greatest vintage bot of all time, with a SLAM stack that's headed for open source. Rebuilds cars too: a 1979 Porsche 911SC, stripped to the shell and going back together from the ground up around a custom ADAS of his own design. Photography, too. Blog posts on all of the above are coming.
Washington University
PhD research, Stream-Based Supercomputing Lab. The thesis: teach parallel stream systems to tune themselves while running. Online queueing models spot where backpressure will bite, then the runtime resizes buffers and re-places kernels mid-flight. No profiling runs, no restarts. It shipped as RaftLib's autotuner.
Community & speaking
Tools
What I work on
Foundations
02 · Impact · patents
Forward citations from Google Patents: 120+ later filings cite the 29 granted patents, at Apple, Intel, NVIDIA, Microsoft, Samsung, IBM, AMD, Google, and a wave of AI-silicon startups. Each chip links to the cited patent behind the cluster. Highlights here. The full impact map lives with the publications.
Accelerator integration & offload
Memory fabrics & disaggregation
Coherence at scale
Hardware queues & message passing
Context switching & migration
Virtual memory & translation
Near-memory & sparse data movement
Memory reliability & prediction
Method: Google Patents “Cited by” and family-citation data, mined June 2026. A citation marks later work that builds on or relates to the patent: prior art acknowledged by the applicant or examiner, not endorsement.
02 · Impact · papers
RaftLib alone: 42 citations. The body of work is cited across OSDI, EuroSys, SC, USENIX ATC, HPCA, MICRO, CGO, and HPDC. See the full record on Google Scholar.
Stream processing & RaftLib
Near-memory & sparse acceleration
Hardware queues & messaging
Performance modeling of streaming
Method: Semantic Scholar forward citations across 26 publications, mined June 2026; curated to recognizable venues, self-citations excluded. Canonical counts live on Google Scholar.
03 · Writing
Stream processing, memory systems, parallelism, and the occasional war story.
Latest: June 19, 2026
Adding cores quietly stopped working, and Amdahl told us why forty years early: scaling stalls on the one dependency you can't parallelize away, not the resource you keep buying.
June 13, 2026
When can you abstract something away for free? The condition has two names, context-free and zero mutual information, and almost nothing in a real system meets it.
June 12, 2026
Every abstraction is borrowed complexity: a quiet debt that comes due at 3am, in the layer you chose never to learn. On staying ahead of the bill.