Publications & Patents

26 of 26 shown

2025

Journal
SpeedMalloc: Improving Multi-threaded Applications via a Lightweight Core for Memory AllocationPDF ↗
Ruihao Li, Qinzhe Wu, Krishna Kavi, Gayatri Mehta, Jonathan C Beard, Neeraja J Yadwadkar, Lizy K John · arXiv preprint arXiv:2508.20253 · 2025
Conference
ViReC: The Virtual Register Context Architecture for Efficient Near-Memory MultithreadingPDF ↗
Matthew Barondeau, Sophia Jiang, Jonathan Beard, Andreas Gerstlauer · Proceedings of the 54th International Conference on Parallel Processing · 2025

2024

Conference
BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingPDF ↗
Qinzhe Wu, Ruihao Li, Jonathan Beard, Lizy John · Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction · 2024

2022

Conference
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core SystemsPDF ↗
Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan Beard, Lizy John · Proceedings of the 51st International Conference on Parallel Processing (ICPP '22) · 2022

2021

Conference
PLANAR: a programmable accelerator for near-memory data rearrangementPDF ↗
Adri\'an Barredo, Adri\`a Armejach, Jonathan Beard, Miquel Moreto · Proceedings of the ACM International Conference on Supercomputing · 2021
Conference
Online model swapping for architectural simulationPDF ↗
Patrick Lavin, Jeffrey Young, Richard Vuduc, Jonathan Beard · Proceedings of the 18th ACM International Conference on Computing Frontiers · 2021
Conference
Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core CommunicationPDF ↗
Qinzhe Wu, Jonathan C. Beard, Ashen Ekanayake, Lizy John · 2021 IEEE International Parallel \& Distributed Processing Symposium · 2021

2020

Conference
The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator OffloadPDF ↗
Mochamad Asri, Curtis Dunham, Roxana Rusitoru, Andreas Gerstlauer, Jonathan Beard · 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) · 2020

2019

Conference
SPiDRE: Accelerating Sparse Memory Access PatternsPDF ↗
Adri\'an Barredo, Jonathan C. Beard, Miquel Moret\'o · Proceedings of the $28^th$ International Conference on Parallel Architectures and Compilation Techniques · 2019
Conference
Multi-spectral Reuse Distance: Divining Spatial Information from Temporal DataPDF ↗
Anthony M. Cabrera, Roger D. Chamberlain, Jonathan C. Beard · The IEEE High Performance Extreme Computing Conference 2019 · 2019

2018

Workshop
This Architecture Tastes Like MicroarchitecturePDF ↗
Curtis Dunham, Jonathan C Beard · Online Proceedings of the 2nd Workshop on Pioneering Processor Paradigms · 2018

2017

Journal
Deadlock-free buffer configuration for stream computingPDF ↗
Peng Li, Jonathan C Beard, Jeremy D Buhler · The International Journal of High Performance Computing Applications · 2017
Journal
RaftLib: a C++ template library for high performance stream parallel processingPDF ↗
Jonathan C Beard, Peng Li, Roger D Chamberlain · The International Journal of High Performance Computing Applications · 2017
Workshop
Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-MoorePDF ↗
Jonathan C Beard, Joshua Randall · Proceedings of the High Performance Computing Post-Moore (HCPM'17) Workshop · 2017
Conference
The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a TimePDF ↗
Jonathan C Beard · Proceedings of the Second International Symposium on Memory Systems · 2017

2015

Thesis
Online Modeling and Tuning of Parallel Stream Processing SystemsPDF ↗
Jonathan C. Beard · Department of Computer Science and Engineering, Washington University in St. Louis · 2015
Conference
Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector MachinesPDF ↗
Jonathan C. Beard, Cooper Epstein, Roger D. Chamberlain · Proceedings of the $6^th$ ACM/SPEC international conference on Performance engineering · 2015
Workshop
RaftLib: A C++ template library for high performance stream parallel processingPDF ↗
Jonathan C. Beard, Peng Li, Roger D. Chamberlain · Proceedings of Programming Models and Applications on Multicores and Manycores · 2015
Workshop
Deadlock-free Buffer Configuration for Stream ComputingPDF ↗
Peng Li, Jonathan C. Beard, Jeremy Buhler · Proceedings of Programming Models and Applications on Multicores and Manycores · 2015
Journal
Run Time Approximation of Non-blocking Service Rates for Streaming SystemsPDF ↗
Jonathan C. Beard, Roger D. Chamberlain · arXiv preprint arXiv:1504.00591v2 · 2015
Conference
Run Time Approximation of Non-blocking Service Rates for Streaming SystemsPDF ↗
Jonathan C. Beard, Roger D. Chamberlain · Proceedings of the $17^th$ IEEE International Conference on High Performance and Communications · 2015
Conference
Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector MachinesPDF ↗
Jonathan C. Beard, Cooper Epstein, Roger D. Chamberlain · Proceedings of Euro-Par 2015 Parallel Processing · 2015

2014

Workshop
Use of a Levy Distribution for Modeling Best Case Execution Time VariationPDF ↗
Jonathan C. Beard, Roger D. Chamberlain · Computer Performance Engineering · 2014

2013

Conference
Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse ArchitecturesPDF ↗
Jonathan C. Beard, Roger D. Chamberlain · Proceedings of the International Symposium on Performance Analysis of Systems and Software · 2013
Conference
Analysis of a Simple Approach to Modeling Performance for Streaming Data ApplicationsPDF ↗
Jonathan C. Beard, Roger D. Chamberlain · Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems · 2013

2011

Conference
Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing PlatformsPDF ↗
Joseph M. Lancaster, Joseph G. Wingbermuehle, Jonathan C. Beard, Roger D. Chamberlain · Proceedings of $9^th$ IEEE/IFIP International Conference on Embedded and Ubiquitous Computing · 2011

Patents

29 granted U.S. patents (co-inventor) across memory systems, cache/coherence, heterogeneous compute, data movement, queueing, and virtualization, plus applications in flight. Each number links to Google Patents.

Translation hints U.S. 12,007,905 · granted 11 June 2024
Multi-channel Q-monitor U.S. 11,960,945 · granted 16 April 2024
Checkpoint Saving U.S. 11,934,272 · granted 19 March 2024
Virtual stashing U.S. 11,841,800 · granted 12 December 2023
Insert operation - Low contention hardware-accelerated multi-producer single consumer queue U.S. 11,614,985 · granted 28 March 2023
Translation for blocks with tightly-coupled interfaces U.S. 11,550,585 · granted 10 January 2023
Write Clean Conditional U.S. 11,445,020 · granted 13 September 2022
Cache stash relay U.S. 11,314,645 · granted 26 April 2022
Snoop Logging U.S. 11,176,042 · granted 16 November 2021
Fine granular protections U.S. 10,909,045 · granted 2 February 2021
Mapping a Distributed Heap onto a Hierarchical Memory System to Enable Efficient Inter-Process Communication U.S. 10,901,691 · granted 26 January 2021
Fault tolerant memory system U.S. 10,884,850 · granted 5 January 2021
Virtual Context Format for Fast Heterogeneous State Migration U.S. 10,671,426 · granted 2 June 2020
Dynamic SVE Vectorization of Scalar Operations using Dataflow Vectorization U.S. 10,620,954 · granted 14 April 2020
Fast Address Translation for Virtual Machines U.S. 10,613,989 · granted 7 April 2020
An efficient method for Scalable Range Based Coherence Modification U.S. 10,592,424 · granted 17 March 2020
Method and Apparatus for Two-Layer Copy-on-Write U.S. 10,565,126 · granted 18 February 2020
Efficient Lazy Migration of Virtual Compute Contexts U.S. 10,552,212 · granted 4 February 2020
Method and Apparatus for Scheduling in a Non-Uniform Compute Device U.S. 10,552,152 · granted 4 February 2020
The Memory Storm Fabric for Hardware Accelerated, Scalable Virtual Shared Memory U.S. 10,534,719 · granted 14 January 2020
Memory Address Translation U.S. 10,489,304 · granted 26 November 2019
Cache-Based Communication Between Execution Threads of a Data Processing System U.S. 10,474,575 · granted 12 November 2019
Memory Node Controller U.S. 10,467,159 · granted 5 November 2019
Method and Apparatus for Reordering in a Non-Uniform Compute Device U.S. 10,445,094 · granted 15 October 2019
Apparatus and method for predicting a redundancy period U.S. 10,423,510 · granted 24 September 2019
Virtual Context Table for Fast Heterogeneous Context Migration U.S. 10,423,446 · granted 24 September 2019
Method and Apparatus for Fast Context Cloning in a Data Processing System U.S. 10,353,826 · granted 16 July 2019
Smart Sparse Data Movement Engine for Increasing Utilization of Bandwidth and Cache Lines U.S. 10,353,601 · granted 16 July 2019
Memory Synchronization Filter U.S. 10,067,708 · granted 4 September 2018

In flight

Message channels U.S. 18/446,570 · filed 9 September 2023

Impact · patents

Where the ideas were cited

Forward citations from Google Patents: 120+ later filings cite the 29 granted patents, at Apple, Intel, NVIDIA, Microsoft, Samsung, IBM, AMD, Google, and a wave of AI-silicon startups. Each chip links to the cited patent behind the cluster.

Accelerator integration & offload

Intel: Configuring/reconfiguring chains of accelerators
Samsung: Host/accelerator work-sharing via shared memory
NVIDIA: Unified virtual memory in heterogeneous systems
NVIDIA: Decoupled lookup-table accelerator in SoC
NVIDIA: Offloading tasks to decoupled accelerators in SoC

US11550585

Memory fabrics & disaggregation

Apple: Scalable system-on-a-chip (M-series fabric)
Apple: Address hashing across multiple memory controllers
Apple: Soft memory folding / compacted pipe addressing
Intel: Hardware-assisted virtual switch
Intel: Flexible NIC host interface

US10534719 US10467159 US10901691

Coherence at scale

Microsoft: Snoop filter w/ disaggregated vector table
Microsoft: Adaptive coherency tracking (×4 patents)
IBM: Coordination namespace / global virtual address space
IBM: Cache snooping coherence protection (×3)
IBM: Pipeline-parallel computing w/ extended memory

US10592424 US11176042 US11445020

Hardware queues & message passing

Google: Optimizing hardware FIFO instructions
Xilinx (AMD): Producer→consumer active cache transfers
Samsung: SoC data sync between processors
Microsoft: Parallel flush recovery of sliced reorder buffers
Huawei: Multi-layer reorder buffer

US11960945 US11614985 US10474575 US10445094

Context switching & migration

Apple: Thread-channel deactivation; memory-backed register preemption; multi-stage thread scheduling (×3)
Intel: NVM cloning w/ hardware copy-on-write
VMware: Cross-privilege-domain communication in CPU cores
Untether AI: Computational memory (×3)
Rebellions: NPU translation-lookaside-buffer updating

US11934272 US10671426 US10423446 US10353826 US10552212

Virtual memory & translation

Intel: Pointer-extent-informed predictors
Intel: NVM cloning w/ HW copy-on-write
Apple: Memory Objects
IBM: Context tracking in virtually-tagged caches
Samsung: Systems & methods for address translation

US12007905 US10613989 US10565126 US10489304

Near-memory & sparse data movement

AMD: Near-memory data-dependent gather & packing
AMD: Reducing side-effects of compute offload to memory
Intel: Smart memory store/load; disaggregated-memory filtering
Samsung: NDP data-centric server; HBM ISA extension
SK hynix: Memory-controller command scheduling (×2)

US10353601 US10067708 US10552152

Memory reliability & prediction

Samsung: SSD-based RAID
Apple: Dynamic address-based data reliability
Raytheon: Optimal bit apportionment vs soft errors (×3)
Ampere: Address-range memory mirroring
Micron: Selective power-on scrub; accelerated read translation

US10884850 US10423510 US10909045

Method: Google Patents “Cited by” and family-citation data, mined June 2026. A citation marks later work that builds on or relates to the patent: prior art acknowledged by the applicant or examiner, not endorsement.

Impact · papers

Cited in the literature

RaftLib alone: 42 citations. The body of work is cited across OSDI, EuroSys, SC, USENIX ATC, HPCA, MICRO, CGO, and HPDC. See the full record on Google Scholar.

Stream processing & RaftLib

EuroSys 2020: PaSh: light-touch data-parallel shell processing (Vasilakis)
OSDI 2022: Practically Correct, Just-in-Time Shell Script Parallelization (Kallas)
SC 2022: TD-NUCA: Runtime-Driven Management of NUCA Caches (Caheny)
USENIX ATC 2018: Scaling HW-Accelerated Network Monitoring (*Flow) (Sonchack)
SoCC 2025: VLCs: Managing Parallelism with Virtualized Libraries (Yan)

Source: RaftLib (IJHPCA) · 42 citations

Near-memory & sparse acceleration

HPCA 2021: FAFNIR: Accelerating Sparse Gathering by Near-Memory Reduction (Asgari)
IEEE Access 2021: DAMOV: Benchmark Suite for Data-Movement Bottlenecks (Oliveira)
MICRO 2023: A Tensor Marshaling Unit for Sparse Tensor Algebra (Siracusa)
PACT 2024: Mozart: Taming Taxes and Composing Accelerators (Suresh)
CGO 2026: Ember: Compiler for Decoupled Access-Execute (Siracusa)

Source: SPiDRE · Dark Bandwidth · NUCD

Hardware queues & messaging

HPCA 2025: Push Multicast: Speculative Coherent Interconnect (Huang)
CC 2024: BLQ: Locality-Aware Blocking-Less Queuing (Wu)
HotOS 2023: NextGen-Malloc: Giving the Allocator Its Own Room (Li)
CF 2024: HASIIL: HW-Assisted Scheduling for IPC Latency in Linux (Twardzik)

Source: Virtual-Link · SPAMeR

Performance modeling of streaming

USENIX ATC 2019: EdgeWise: A Better Stream Processing Engine for the Edge (Fu)
HPDC 2023: Streaming Task Graph Scheduling for Dataflow Architectures (De Matteis)
Parallel Computing 2021: Reducing queuing impact in irregular dataflow (Timcheck)
IPDPSW 2024: Network-Calculus Models for Streaming Applications (Faber)

Source: Analytic streaming models (MASCOTS) · 20 citations

Method: Semantic Scholar forward citations across 26 publications, mined June 2026; curated to recognizable venues, self-citations excluded. Canonical counts live on Google Scholar.

Publications & patents