Media

Full list of papers and patents available at Google Scholar.

Open Source Software (things I lead)

RaftLib - runtime for heterogeneous data-flow/streami processing using a C++ DSL, was also my thesis. It’s Apache 2.0, so free to use for pretty much everything.

Interviews

CppCast Episode 50 - My thoughts on parallel, distributed computing, and how that shaped RaftLib

Talks (just talks, not papers)

MEMSYS 2019 - Panel:New and Cool Memory Technologies (slides)
Panel on challenges for memory centric computing at MCHPC-18 (i.c.w. SC'18 - Dallas) (slides)
Reducing Dark Bandwidth Through Data Reduction Near Memory - Dark bandwidth and some ways to get rid of it through near memory gather/scatter (slides)
A Vision For Destruction Of Post-Moore Disruption - prospective talk with my views on architecture from a biological evolution perspective. (slides)
MEMSYS 2017 - New and Cool Memory Technologies - A panel discussion on how to use new memory technologies, challenges, and in general what is new and cool in the field. (slides)
CppNow2017 - RaftLib - RaftLib Tutorial 2, slides and video.
CppNow2017 - FIFO Optimization Tips, details and video.
Future of Memory Technology for Exascale and Beyond IV - Panel discussion on the future of memory technology for exascale. My take: instead of focusing just on new technologies, lets focus on systems. (session page)
CppNow2016 - RaftLib tutorial, slides and video (official one available via YouTube channel).

Organized Workshops and Conferences

In the News

Shedding Light on Dark Bandwidth, published 14 September 2017 in The Next Platform (nextplatform.com).
Momentum is Building for ARM in HPC, published 30 June 2017 in The Next Platform (nextplatform.com).

Research Publications

Full list of papers and patents available at Google Scholar.

The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload

Introduces the NUCD architecture, a lightweight accelerator offload mechanism tightly coupled to a host core for fine-grain tasks; demonstrates performance gains over conventional driver-based offload.

Asri, M., Dunham, C., Rusitoru, R., Gerstlauer, A., & Beard, J. (2020). The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@article{adrgb20,
  title = {The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload},
  author = {Asri, Mochamad and Dunham, Curtis and Rusitoru, Roxana and Gerstlauer, Andreas and Beard, Jonathan},
  publisher = {28th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing},
  series = {PDP2020},
  year = {2020},
  month = mar
}
```
Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data

Presents multi-spectral reuse distance to infer spatial locality from temporal traces across multiple granularities; uses Earth Mover’s Distance to quantify shifts and guide page-sizing decisions.

Cabrera, A. M., Chamberlain, R. D., & Beard, J. C. (2019, September). Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data.
- PDF
- Slides
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{ccb19,
  title = {Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data},
  author = {Cabrera, Anthony M. and Chamberlain, Roger D. and Beard, Jonathan C.},
  publisher = {The IEEE High Performance Extreme Computing Conference 2019},
  series = {HPEC2019},
  year = {2019},
  month = sep,
  slides = {../slides/HPEC-2019-anthony.pdf}
}
```
SPiDRE: Accelerating Sparse Memory Access Patterns

Explores SPiDRE hardware to accelerate sparse and irregular memory access via near-memory data reorganization, improving bandwidth utilization and performance.

Barredo, A., Beard, J. C., & Moretó, M. (2019, September). SPiDRE: Accelerating Sparse Memory Access Patterns.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{bbm19,
  title = {SPiDRE: Accelerating Sparse Memory Access Patterns},
  author = {Barredo, Adri\'an and Beard, Jonathan C. and Moret\'o, Miquel},
  publisher = {28th International Conference on Parallel Architectures and Compilation Techniques (PACT)},
  series = {PACT2019},
  year = {2019},
  month = sep
}
```
This Architecture Tastes Like Microarchitecture

Revisits early RISC ideas and argues for ISAs that specify less microarchitectural detail, contrasting multiple ISA design philosophies and their implications for hardware/software co-design.

Dunham, C., & Beard, J. C. (2018). This Architecture Tastes Like Microarchitecture. The 2nd Workshop on Pioneering Processor Paradigms.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@online{db18a,
  title = {This Architecture Tastes Like Microarchitecture},
  author = {Dunham, Curtis and Beard, Jonathan C},
  publisher = {The 2nd Workshop on Pioneering Processor Paradigms},
  series = {WP3},
  year = {2018}
}
```
The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time

Introduces SPiDRE, a near-memory programmable data-reduction/rearrangement engine for sparse workloads, including a programmer interface and evaluation on representative applications.

Beard, J. C. (2017, October). The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time. Proceedings of the Second International Symposium on Memory Systems.
- PDF
- Slides
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{b17a,
  title = {The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time},
  author = {Beard, Jonathan C},
  booktitle = {Proceedings of the Second International Symposium on Memory Systems},
  year = {2017},
  month = oct,
  organization = {ACM},
  slides = {../slides/memsys2017_SPiDRE_Beard.pdf}
}
```
Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore

Defines ’dark bandwidth’ as wasted data movement driven by memory interfaces and virtual- memory abstractions, and argues for system-level approaches that reduce data movement in post-Moore systems.

Beard, J. C., & Randall, J. (2017). Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore. Proc. High Performance Computing Post-Moore (HCPM’17).
- PDF
- Slides
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@article{br17a,
  title = {Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore},
  author = {Beard, Jonathan C and Randall, Joshua},
  booktitle = {Proc. High Performance Computing Post-Moore (HCPM'17)},
  series = {Lecture Notes in Computer Science},
  year = {2017},
  month = jun,
  slides = {../slides/beard_hcpm2017.pdf}
}
```
RaftLib: A C++ template library for high performance stream parallel processing

Journal version of RaftLib detailing a C++ template library that enables streaming graph optimizations, dynamic queue tuning, automatic parallelization, and low-overhead monitoring for legacy C/C++ code.

Beard, J. C., Li, P., & Chamberlain, R. D. (2016). RaftLib: A C++ template library for high performance stream parallel processing. International Journal of High Performance Computing Applications. https://doi.org/https://doi.org/10.1177/1094342016672542
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@article{blc16,
  author = {Beard, Jonathan C and Li, Peng and Chamberlain, Roger D},
  title = {RaftLib: A C++ template library for high performance stream parallel processing},
  year = {2016},
  doi = {https://doi.org/10.1177/1094342016672542},
  journal = {International Journal of High Performance Computing Applications}
}
```
Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines

Extends SVM-based model selection to online settings, determining when M/M/1 and M/D/1 models apply to stream-processing queues; validated across multiple hardware and software platforms.

Beard, J. C., Epstein, C., & Chamberlain, R. D. (2015). Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines. Proceedings of Euro-Par 2015 Parallel Processing, 82-93.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{bec15b,
  title = {Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines},
  author = {Beard, Jonathan C. and Epstein, Cooper and Chamberlain, Roger D.},
  booktitle = {Proceedings of Euro-Par 2015 Parallel Processing},
  year = {2015},
  month = aug,
  pages = {82-93},
  publisher = {Springer}
}
```
Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Introduces runtime estimation of kernel service rates for streaming systems to enable continuous retuning and avoid reliance on steady-state assumptions.

Beard, J. C., & Chamberlain, R. D. (2015). Run Time Approximation of Non-blocking Service Rates for Streaming Systems. Proceedings of the 17th IEEE International Conference on High Performance and Communications, 792-797. https://doi.org/https://doi.org/10.1109/HPCC-CSS-ICESS.2015.64
- PDF
- Slides
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{bc15f,
  author = {Beard, Jonathan C. and Chamberlain, Roger D.},
  booktitle = {Proceedings of the 17th IEEE International Conference on High Performance and Communications},
  title = {Run Time Approximation of Non-blocking Service Rates for Streaming Systems},
  year = {2015},
  pages = {792-797},
  month = aug,
  publisher = {IEEE},
  doi = {https://doi.org/10.1109/HPCC-CSS-ICESS.2015.64},
  slides = {../slides/hpcc2015_public.pdf}
}
```
Online Modeling and Tuning of Parallel Stream Processing Systems

Thesis on online modeling and tuning for parallel stream processing: presents RaftLib and techniques for runtime modeling/optimization to reduce tuning cost and improve portability across heterogeneous systems.

Beard, J. C. (2015). Online Modeling and Tuning of Parallel Stream Processing Systems [PhD thesis]. Department of Computer Science and Engineering, Washington University in St. Louis.
- Publisher
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@phdthesis{beardthesis,
  author = {Beard, Jonathan C.},
  title = {Online Modeling and Tuning of Parallel Stream Processing Systems},
  school = {Department of Computer Science and Engineering, Washington University
  in St. Louis},
  month = aug,
  year = {2015},
  link = {https://www.jonathanbeard.io/pdf/beard-thesis.pdf}
}
```
Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Presents an online method to estimate non-blocking service rates of stream kernels, enabling dynamic queueing/network-flow optimization under changing workloads; implemented in RaftLib and validated on benchmarks.

Beard, J. C., & Chamberlain, R. D. (2015). Run Time Approximation of Non-blocking Service Rates for Streaming Systems. ArXiv Preprint ArXiv:1504.00591v2.
- Publisher
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@article{bc15b,
  title = {Run Time Approximation of Non-blocking Service Rates for Streaming Systems},
  author = {Beard, Jonathan C. and Chamberlain, Roger D.},
  journal = {arXiv preprint arXiv:1504.00591v2},
  year = {2015},
  month = apr,
  link = {https://arxiv.org/pdf/1504.00591v2}
}
```
Deadlock-free Buffer Configuration for Stream Computing

Shows that output-buffer sizing in stream graphs can induce deadlock; provides necessary and sufficient conditions for deadlock-free configurations plus algorithms to detect and fix unsafe buffer settings.

Li, P., Beard, J. C., & Buhler, J. (2015). Deadlock-free Buffer Configuration for Stream Computing. Proceedings of Programming Models and Applications on Multicores and Manycores, 164-169.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{lbb15,
  author = {Li, Peng and Beard, Jonathan C. and Buhler, Jeremy},
  title = {Deadlock-free Buffer Configuration for Stream Computing},
  publisher = {ACM},
  address = {New York, NY, USA},
  year = {2015},
  month = feb,
  series = {PMAM 2015},
  booktitle = {Proceedings of Programming Models and Applications on Multicores and Manycores},
  pages = {164-169}
}
```
RaftLib: A C++ template library for high performance stream parallel processing

Introduces RaftLib, a C++ template library for stream/data-flow processing that brings streaming optimizations to legacy C/C++ code, including dynamic queue optimization, automatic parallelization, and low-overhead monitoring.

Beard, J. C., Li, P., & Chamberlain, R. D. (2015). RaftLib: A C++ template library for high performance stream parallel processing. Proceedings of Programming Models and Applications on Multicores and Manycores, 96-105.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{blc15,
  author = {Beard, Jonathan C. and Li, Peng and Chamberlain, Roger D.},
  title = {RaftLib: A {C++} template library for high performance stream parallel processing},
  publisher = {ACM},
  address = {New York, NY, USA},
  year = {2015},
  month = feb,
  series = {PMAM 2015},
  booktitle = {Proceedings of Programming Models and Applications on Multicores and Manycores},
  pages = {96-105}
}
```
Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines

Uses SVMs to automate reliability classification of queueing models for streaming systems, reducing reliance on expert tuning; demonstrated on microbenchmarks spanning diverse queueing conditions.

Beard, J. C., Epstein, C., & Chamberlain, R. D. (2015). Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines. Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 325-328.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{bec15,
  author = {Beard, Jonathan C. and Epstein, Cooper and Chamberlain, Roger D.},
  title = {Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines},
  month = jan,
  year = {2015},
  booktitle = {Proceedings of the 6th ACM/SPEC international conference on Performance engineering},
  series = {ICPE 2015},
  publisher = {ACM},
  address = {New York, NY, USA},
  pages = {325-328}
}
```
Use of a Levy Distribution for Modeling Best Case Execution Time Variation

Proposes a truncated Levy distribution to characterize best-case execution-time variation in multicore systems and a parameterization method based on measurable system parameters; evaluated under Linux CFS.

Beard, J. C., & Chamberlain, R. D. (2014). Use of a Levy Distribution for Modeling Best Case Execution Time Variation. In A. Horvath & K. Wolter (Eds.), Computer Performance Engineering (Vol. 8721, pp. 74-88). Springer International Publishing.
- PDF
- Slides
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@incollection{bc14a,
  year = {2014},
  month = sep,
  isbn = {978-3-319-10884-1},
  booktitle = {Computer Performance Engineering},
  volume = {8721},
  series = {Lecture Notes in Computer Science},
  editor = {Horvath, A. and Wolter, K.},
  title = {Use of a {Levy} Distribution for Modeling Best Case Execution Time Variation},
  publisher = {Springer International Publishing},
  author = {Beard, Jonathan C. and Chamberlain, Roger D.},
  pages = {74-88},
  slides = {../slides/EPEW2014.pdf}
}
```
Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications

Describes a simple, practical approach to modeling throughput and buffering for streaming applications deployed on heterogeneous processors, emphasizing usability amid hidden architectural complexity.

Beard, J. C., & Chamberlain, R. D. (2013). Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications. Proc. of IEEE Int’l Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 345-349.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{bc13b,
  author = {Beard, Jonathan C. and Chamberlain, Roger D.},
  title = {Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications},
  booktitle = {Proc. of IEEE Int’l Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems},
  month = aug,
  year = {2013},
  pages = {345-349}
}
```
Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse Architectures

Presents a lightweight analytic model (hybrid max-flow and decomposed queueing) to estimate throughput and buffering needs for streaming applications on heterogeneous hardware, validated with real and synthetic benchmarks.

Beard, J. C., & Chamberlain, R. D. (2013). Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse Architectures. Proc. of Int’l Symp. on Performance Analysis of Systems and Software, 138-139.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{bc13a,
  author = {Beard, Jonathan C. and Chamberlain, Roger D.},
  title = {Use of Simple Analytic Performance Models of Streaming Data
  Applications Deployed on Diverse Architectures},
  booktitle = {Proc. of Int’l Symp. on Performance Analysis of Systems
  and Software},
  month = apr,
  year = {2013},
  pages = {138-139}
}
```
Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing Platforms

Introduces TimeTrial, a low-impact performance monitor for streaming applications spanning heterogeneous platforms (multicore CPUs and FPGAs). It measures cross-platform communication queues using a mix of direct measurement and modeling, validated on microbenchmarks and a Monte Carlo Laplace application.

Lancaster, J. M., Wingbermuehle, J. G., Beard, J. C., & Chamberlain, R. D. (2011). Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing Platforms. Proc. of Ninth IEEE/IFIP Int’l Conf. on Embedded and Ubiquitous Computing, 280-287.
- PDF
- Share
- LinkedIn
- Facebook
- HN
Show BibTeX
```
@inproceedings{lancaster11b,
  author = {Lancaster, Joseph M. and Wingbermuehle, Joseph G. and Beard, Jonathan C. and Chamberlain, Roger D.},
  title = {Crossing Boundaries in {TimeTrial}: Monitoring Communications Across
  Architecturally Diverse Computing Platforms},
  booktitle = {Proc. of Ninth IEEE/IFIP Int’l Conf. on Embedded and Ubiquitous
  Computing},
  month = oct,
  year = {2011},
  pages = {280-287}
}
```

Media

Open Source Software (things I lead)

Interviews

Talks (just talks, not papers)

Organized Workshops and Conferences

In the News

Research Publications

The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload

Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data

SPiDRE: Accelerating Sparse Memory Access Patterns

This Architecture Tastes Like Microarchitecture

The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time

Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore

RaftLib: A C++ template library for high performance stream parallel processing

Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines

Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Online Modeling and Tuning of Parallel Stream Processing Systems

Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Deadlock-free Buffer Configuration for Stream Computing

RaftLib: A C++ template library for high performance stream parallel processing

Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines

Use of a Levy Distribution for Modeling Best Case Execution Time Variation

Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications

Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse Architectures

Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing Platforms