26 of 26 shown

2025

  • Journal
    SpeedMalloc: Improving Multi-threaded Applications via a Lightweight Core for Memory AllocationPDF ↗
    Ruihao Li, Qinzhe Wu, Krishna Kavi, Gayatri Mehta, Jonathan C Beard, Neeraja J Yadwadkar, Lizy K John · arXiv preprint arXiv:2508.20253 · 2025
  • Conference
    ViReC: The Virtual Register Context Architecture for Efficient Near-Memory MultithreadingPDF ↗
    Matthew Barondeau, Sophia Jiang, Jonathan Beard, Andreas Gerstlauer · Proceedings of the 54th International Conference on Parallel Processing · 2025

2024

  • Conference
    BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingPDF ↗
    Qinzhe Wu, Ruihao Li, Jonathan Beard, Lizy John · Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction · 2024

2022

  • Conference
    SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core SystemsPDF ↗
    Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan Beard, Lizy John · Proceedings of the 51st International Conference on Parallel Processing (ICPP '22) · 2022

2021

  • Conference
    PLANAR: a programmable accelerator for near-memory data rearrangement
    Adri\'an Barredo, Adri\`a Armejach, Jonathan Beard, Miquel Moreto · Proceedings of the ACM International Conference on Supercomputing · 2021
  • Conference
    Online model swapping for architectural simulationPDF ↗
    Patrick Lavin, Jeffrey Young, Richard Vuduc, Jonathan Beard · Proceedings of the 18th ACM International Conference on Computing Frontiers · 2021
  • Conference
    Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core CommunicationPDF ↗
    Qinzhe Wu, Jonathan C. Beard, Ashen Ekanayake, Lizy John · 2021 IEEE International Parallel \& Distributed Processing Symposium · 2021

2020

  • Conference
    The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator OffloadPDF ↗
    Mochamad Asri, Curtis Dunham, Roxana Rusitoru, Andreas Gerstlauer, Jonathan Beard · 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) · 2020

2019

  • Conference
    SPiDRE: Accelerating Sparse Memory Access PatternsPDF ↗
    Adri\'an Barredo, Jonathan C. Beard, Miquel Moret\'o · Proceedings of the $28^th$ International Conference on Parallel Architectures and Compilation Techniques · 2019
  • Conference
    Multi-spectral Reuse Distance: Divining Spatial Information from Temporal DataPDF ↗
    Anthony M. Cabrera, Roger D. Chamberlain, Jonathan C. Beard · The IEEE High Performance Extreme Computing Conference 2019 · 2019

2018

  • Workshop
    This Architecture Tastes Like Microarchitecture
    Curtis Dunham, Jonathan C Beard · Online Proceedings of the 2nd Workshop on Pioneering Processor Paradigms · 2018

2017

  • Journal
    Deadlock-free buffer configuration for stream computingPDF ↗
    Peng Li, Jonathan C Beard, Jeremy D Buhler · The International Journal of High Performance Computing Applications · 2017
  • Journal
    RaftLib: a C++ template library for high performance stream parallel processingPDF ↗
    Jonathan C Beard, Peng Li, Roger D Chamberlain · The International Journal of High Performance Computing Applications · 2017
  • Workshop
    Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore
    Jonathan C Beard, Joshua Randall · Proceedings of the High Performance Computing Post-Moore (HCPM'17) Workshop · 2017
  • Conference
    The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a TimePDF ↗
    Jonathan C Beard · Proceedings of the Second International Symposium on Memory Systems · 2017

2015

  • Thesis
    Online Modeling and Tuning of Parallel Stream Processing SystemsPDF ↗
    Jonathan C. Beard · Department of Computer Science and Engineering, Washington University in St. Louis · 2015
  • Conference
    Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector MachinesPDF ↗
    Jonathan C. Beard, Cooper Epstein, Roger D. Chamberlain · Proceedings of the $6^th$ ACM/SPEC international conference on Performance engineering · 2015
  • Workshop
    RaftLib: A C++ template library for high performance stream parallel processingPDF ↗
    Jonathan C. Beard, Peng Li, Roger D. Chamberlain · Proceedings of Programming Models and Applications on Multicores and Manycores · 2015
  • Workshop
    Deadlock-free Buffer Configuration for Stream ComputingPDF ↗
    Peng Li, Jonathan C. Beard, Jeremy Buhler · Proceedings of Programming Models and Applications on Multicores and Manycores · 2015
  • Journal
    Run Time Approximation of Non-blocking Service Rates for Streaming SystemsPDF ↗
    Jonathan C. Beard, Roger D. Chamberlain · arXiv preprint arXiv:1504.00591v2 · 2015
  • Conference
    Run Time Approximation of Non-blocking Service Rates for Streaming SystemsPDF ↗
    Jonathan C. Beard, Roger D. Chamberlain · Proceedings of the $17^th$ IEEE International Conference on High Performance and Communications · 2015
  • Conference
    Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector MachinesPDF ↗
    Jonathan C. Beard, Cooper Epstein, Roger D. Chamberlain · Proceedings of Euro-Par 2015 Parallel Processing · 2015

2014

  • Workshop
    Use of a Levy Distribution for Modeling Best Case Execution Time VariationPDF ↗
    Jonathan C. Beard, Roger D. Chamberlain · Computer Performance Engineering · 2014

2013

  • Conference
    Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse ArchitecturesPDF ↗
    Jonathan C. Beard, Roger D. Chamberlain · Proceedings of the International Symposium on Performance Analysis of Systems and Software · 2013
  • Conference
    Analysis of a Simple Approach to Modeling Performance for Streaming Data ApplicationsPDF ↗
    Jonathan C. Beard, Roger D. Chamberlain · Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems · 2013

2011

  • Conference
    Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing PlatformsPDF ↗
    Joseph M. Lancaster, Joseph G. Wingbermuehle, Jonathan C. Beard, Roger D. Chamberlain · Proceedings of $9^th$ IEEE/IFIP International Conference on Embedded and Ubiquitous Computing · 2011

Patents

29 granted U.S. patents (co-inventor) across memory systems, cache/coherence, heterogeneous compute, data movement, queueing, and virtualization — plus applications in flight. Each number links to Google Patents.

  • Translation hints U.S. 12,007,905  · granted 11 June 2024
  • Multi-channel Q-monitor U.S. 11,960,945  · granted 16 April 2024
  • Checkpoint Saving U.S. 11,934,272  · granted 19 March 2024
  • Virtual stashing U.S. 11,841,800  · granted 12 December 2023
  • Insert operation - Low contention hardware-accelerated multi-producer single consumer queue U.S. 11,614,985  · granted 28 March 2023
  • Translation for blocks with tightly-coupled interfaces U.S. 11,550,585  · granted 10 January 2023
  • Write Clean Conditional U.S. 11,445,020  · granted 13 September 2022
  • Cache stash relay U.S. 11,314,645  · granted 26 April 2022
  • Snoop Logging U.S. 11,176,042  · granted 16 November 2021
  • Fine granular protections U.S. 10,909,045  · granted 2 February 2021
  • Mapping a Distributed Heap onto a Hierarchical Memory System to Enable Efficient Inter-Process Communication U.S. 10,901,691  · granted 26 January 2021
  • Fault tolerant memory system U.S. 10,884,850  · granted 5 January 2021
  • Virtual Context Format for Fast Heterogeneous State Migration U.S. 10,671,426  · granted 2 June 2020
  • Dynamic SVE Vectorization of Scalar Operations using Dataflow Vectorization U.S. 10,620,954  · granted 14 April 2020
  • Fast Address Translation for Virtual Machines U.S. 10,613,989  · granted 7 April 2020
  • An efficient method for Scalable Range Based Coherence Modification U.S. 10,592,424  · granted 17 March 2020
  • Method and Apparatus for Two-Layer Copy-on-Write U.S. 10,565,126  · granted 18 February 2020
  • Efficient Lazy Migration of Virtual Compute Contexts U.S. 10,552,212  · granted 4 February 2020
  • Method and Apparatus for Scheduling in a Non-Uniform Compute Device U.S. 10,552,152  · granted 4 February 2020
  • The Memory Storm Fabric for Hardware Accelerated, Scalable Virtual Shared Memory U.S. 10,534,719  · granted 14 January 2020
  • Memory Address Translation U.S. 10,489,304  · granted 26 November 2019
  • Cache-Based Communication Between Execution Threads of a Data Processing System U.S. 10,474,575  · granted 12 November 2019
  • Memory Node Controller U.S. 10,467,159  · granted 5 November 2019
  • Method and Apparatus for Reordering in a Non-Uniform Compute Device U.S. 10,445,094  · granted 15 October 2019
  • Apparatus and method for predicting a redundancy period U.S. 10,423,510  · granted 24 September 2019
  • Virtual Context Table for Fast Heterogeneous Context Migration U.S. 10,423,446  · granted 24 September 2019
  • Method and Apparatus for Fast Context Cloning in a Data Processing System U.S. 10,353,826  · granted 16 July 2019
  • Smart Sparse Data Movement Engine for Increasing Utilization of Bandwidth and Cache Lines U.S. 10,353,601  · granted 16 July 2019
  • Memory Synchronization Filter U.S. 10,067,708  · granted 4 September 2018

In flight

  • Message channels U.S. 18/446,570 · filed 9 September 2023

Impact · patents

Where the ideas were cited

Forward citations from Google Patents: 120+ later filings cite the 29 granted patents — at Apple, Intel, NVIDIA, Microsoft, Samsung, IBM, AMD, Google, and a wave of AI-silicon startups. Each chip links to the cited patent behind the cluster.

Accelerator integration & offload

  • Intel — Configuring/reconfiguring chains of accelerators
  • Samsung — Host/accelerator work-sharing via shared memory
  • NVIDIA — Unified virtual memory in heterogeneous systems
  • NVIDIA — Decoupled lookup-table accelerator in SoC
  • NVIDIA — Offloading tasks to decoupled accelerators in SoC

Memory fabrics & disaggregation

  • Apple — Scalable system-on-a-chip (M-series fabric)
  • Apple — Address hashing across multiple memory controllers
  • Apple — Soft memory folding / compacted pipe addressing
  • Intel — Hardware-assisted virtual switch
  • Intel — Flexible NIC host interface

Coherence at scale

  • Microsoft — Snoop filter w/ disaggregated vector table
  • Microsoft — Adaptive coherency tracking (×4 patents)
  • IBM — Coordination namespace / global virtual address space
  • IBM — Cache snooping coherence protection (×3)
  • IBM — Pipeline-parallel computing w/ extended memory

Hardware queues & message passing

  • Google — Optimizing hardware FIFO instructions
  • Xilinx (AMD) — Producer→consumer active cache transfers
  • Samsung — SoC data sync between processors
  • Microsoft — Parallel flush recovery of sliced reorder buffers
  • Huawei — Multi-layer reorder buffer

Context switching & migration

  • Apple — Thread-channel deactivation; memory-backed register preemption; multi-stage thread scheduling (×3)
  • Intel — NVM cloning w/ hardware copy-on-write
  • VMware — Cross-privilege-domain communication in CPU cores
  • Untether AI — Computational memory (×3)
  • Rebellions — NPU translation-lookaside-buffer updating

Virtual memory & translation

  • Intel — Pointer-extent-informed predictors
  • Intel — NVM cloning w/ HW copy-on-write
  • Apple — Memory Objects
  • IBM — Context tracking in virtually-tagged caches
  • Samsung — Systems & methods for address translation

Near-memory & sparse data movement

  • AMD — Near-memory data-dependent gather & packing
  • AMD — Reducing side-effects of compute offload to memory
  • Intel — Smart memory store/load; disaggregated-memory filtering
  • Samsung — NDP data-centric server; HBM ISA extension
  • SK hynix — Memory-controller command scheduling (×2)

Memory reliability & prediction

  • Samsung — SSD-based RAID
  • Apple — Dynamic address-based data reliability
  • Raytheon — Optimal bit apportionment vs soft errors (×3)
  • Ampere — Address-range memory mirroring
  • Micron — Selective power-on scrub; accelerated read translation

Method: Google Patents “Cited by” and family-citation data, mined June 2026. A citation marks later work that builds on or relates to the patent — prior art acknowledged by the applicant or examiner, not endorsement.

Impact · papers

Cited in the literature

RaftLib alone: 42 citations. The body of work is cited across OSDI, EuroSys, SC, USENIX ATC, HPCA, MICRO, CGO, and HPDC. See the full record on Google Scholar.

Stream processing & RaftLib

  • EuroSys 2020 — PaSh: light-touch data-parallel shell processing (Vasilakis)
  • OSDI 2022 — Practically Correct, Just-in-Time Shell Script Parallelization (Kallas)
  • SC 2022 — TD-NUCA: Runtime-Driven Management of NUCA Caches (Caheny)
  • USENIX ATC 2018 — Scaling HW-Accelerated Network Monitoring (*Flow) (Sonchack)
  • SoCC 2025 — VLCs: Managing Parallelism with Virtualized Libraries (Yan)
Source: RaftLib (IJHPCA) — 42 citations

Near-memory & sparse acceleration

  • HPCA 2021 — FAFNIR: Accelerating Sparse Gathering by Near-Memory Reduction (Asgari)
  • IEEE Access 2021 — DAMOV: Benchmark Suite for Data-Movement Bottlenecks (Oliveira)
  • MICRO 2023 — A Tensor Marshaling Unit for Sparse Tensor Algebra (Siracusa)
  • PACT 2024 — Mozart: Taming Taxes and Composing Accelerators (Suresh)
  • CGO 2026 — Ember: Compiler for Decoupled Access-Execute (Siracusa)
Source: SPiDRE · Dark Bandwidth · NUCD

Hardware queues & messaging

  • HPCA 2025 — Push Multicast: Speculative Coherent Interconnect (Huang)
  • CC 2024 — BLQ: Locality-Aware Blocking-Less Queuing (Wu)
  • HotOS 2023 — NextGen-Malloc: Giving the Allocator Its Own Room (Li)
  • CF 2024 — HASIIL: HW-Assisted Scheduling for IPC Latency in Linux (Twardzik)
Source: Virtual-Link · SPAMeR

Performance modeling of streaming

  • USENIX ATC 2019 — EdgeWise: A Better Stream Processing Engine for the Edge (Fu)
  • HPDC 2023 — Streaming Task Graph Scheduling for Dataflow Architectures (De Matteis)
  • Parallel Computing 2021 — Reducing queuing impact in irregular dataflow (Timcheck)
  • IPDPSW 2024 — Network-Calculus Models for Streaming Applications (Faber)
Source: Analytic streaming models (MASCOTS) — 20 citations

Method: Semantic Scholar forward citations across 26 publications, mined June 2026; curated to recognizable venues, self-citations excluded. Canonical counts live on Google Scholar.