Featured · Open source · 2013–present

RaftLib

A C++ stream-processing and dataflow DSL with a parallel runtime: you write compute kernels, wire them into a graph, and RaftLib owns the queues, scheduling, and parallelism. Under the hood: asynchronous lock-free FIFOs, dynamic instrumentation, and an auto-tuner that applies queueing theory, machine learning, and flow-network theory to resize buffers and re-place kernels at runtime. It began as the systems half of my PhD and is still maintained today.

Nearly 1,000 GitHub stars, ~46 daily clones, front page of Hacker News twice, Apache-2.0. Featured in Packt's C++ Reactive Programming (2018), listed in Awesome C++ and Awesome Parallel Computing, and cited on Wikipedia's CSP article alongside Erlang, Go, and Clojure's core.async, with a Wikipedia page of its own.

Perf / TCO · Google · 2022–present

gSoC platform performance

  • System-wide Perf/TCO ownership for Google's internal SoC program across silicon generations and product verticals.
  • Multivariate CapEx/OpEx, perf-per-dollar, and perf-per-watt models, from cores to racks, feeding fleet adoption decisions.
  • ML-guided tuning of core registers, memory-controller settings, and mesh QoS, validated against internal customer workloads including AI/ML inference.
  • Technical lead for Google-specific accelerator IP.

IP Contributions · Arm · 2015–2022

Scalable systems IP

  • The LS64 accelerator architecture and gather-hint instruction: early AArch64 8.7/9.2 accelerator extensions (e.g. st64).
  • CHI memory-copy enhancements from bottleneck analysis across CPU, memory subsystem, coherent state, and interconnect.
  • GenZ-CXL subcommittee; Project-38 (DoD/DoE) lead; Sandia DOE data-movement rep.
  • 29 granted U.S. patents across memory, coherence, and data movement.

Research tooling · PhD

Auto-tuning framework

The queueing-theory + machine-learning + control-theory framework from the thesis: online modeling of parallel stream systems, with live buffer sizing and placement decisions. The ideas shipped in RaftLib's runtime.

Advisory · 2016–2020

FastData.io

Advisor and technical consultant to an early-stage data-systems startup working on GPU-accelerated streaming data processing.

Simulation & analysis

What-if before hardware exists

Architectural simulation at scale with gem5 and SST: multi-fidelity what-if analysis for systems that don't exist yet, plus targeted microbenchmarks for cloud-native services (memcached, redis, P4 packet processing, GROMACS, NAMD).

Highlights live on the home page; the paper trail is on the publications & patents page.