Software

Featured · Open source · 2013–present

RaftLib

A C++ stream-processing and dataflow DSL with a parallel runtime: you write compute kernels, wire them into a graph, and RaftLib owns the queues, scheduling, and parallelism. Under the hood: asynchronous lock-free FIFOs, dynamic instrumentation, and an auto-tuner that applies queueing theory, machine learning, and flow-network theory to resize buffers and re-place kernels at runtime. It began as the systems half of my PhD and is still maintained today.

Nearly 1,000 GitHub stars, ~46 daily clones, front page of Hacker News twice, Apache-2.0. Featured in Packt's C++ Reactive Programming (2018), listed in Awesome C++ and Awesome Parallel Computing, and cited on Wikipedia's CSP article alongside Erlang, Go, and Clojure's core.async, with a Wikipedia page of its own.

Talks: RaftLib Tutorial (C++Now 2017) · Good FIFOs Make Good Neighbors · C++Now 2016
Papers: the RaftLib IJHPCA paper and the auto-tuning thesis are on the publications page

raftlib.io → GitHub

Perf / TCO · Google · 2022–present

gSoC platform performance

System-wide Perf/TCO ownership for Google's internal SoC program across silicon generations and product verticals.
Multivariate CapEx/OpEx, perf-per-dollar, and perf-per-watt models, from cores to racks, feeding fleet adoption decisions.
ML-guided tuning of core registers, memory-controller settings, and mesh QoS, validated against internal customer workloads including AI/ML inference.
Technical lead for Google-specific accelerator IP.

IP Contributions · Arm · 2015–2022

Scalable systems IP

The LS64 accelerator architecture and gather-hint instruction: early AArch64 8.7/9.2 accelerator extensions (e.g. st64).
CHI memory-copy enhancements from bottleneck analysis across CPU, memory subsystem, coherent state, and interconnect.
GenZ-CXL subcommittee; Project-38 (DoD/DoE) lead; Sandia DOE data-movement rep.
29 granted U.S. patents across memory, coherence, and data movement.

Research tooling · PhD

Auto-tuning framework

The queueing-theory + machine-learning + control-theory framework from the thesis: online modeling of parallel stream systems, with live buffer sizing and placement decisions. The ideas shipped in RaftLib's runtime.

Advisory · 2016–2020

FastData.io

Advisor and technical consultant to an early-stage data-systems startup working on GPU-accelerated streaming data processing.

Open source · long tail

Other projects

Raft language: a language built on the dataflow ideas behind RaftLib.
ipc: shared-memory inter-process communication library.
LeigNet: prototype simulation-composition framework from Arm Research.
constexpr HighwayHash: compile-time port of Google's hash.
Everything else on GitHub →

Simulation & analysis

What-if before hardware exists

Architectural simulation at scale with gem5 and SST: multi-fidelity what-if analysis for systems that don't exist yet, plus targeted microbenchmarks for cloud-native services (memcached, redis, P4 packet processing, GROMACS, NAMD).

Highlights live on the home page; the paper trail is on the publications & patents page.