Media
Panels, Ariticles, Publications, Interviews, Talks, and Thesis for Jonathan Beard
Full list of papers and patents available at Google Scholar.
Open Source Software (things I lead)
- RaftLib - runtime for heterogeneous data-flow/streami processing using a C++ DSL, was also my thesis. It’s Apache 2.0, so free to use for pretty much everything.
Interviews
- CppCast Episode 50 - My thoughts on parallel, distributed computing, and how that shaped RaftLib
Talks (just talks, not papers)
- MEMSYS 2019 - Panel:New and Cool Memory Technologies (slides)
- Panel on challenges for memory centric computing at MCHPC-18 (i.c.w. SC'18 - Dallas) (slides)
- Reducing Dark Bandwidth Through Data Reduction Near Memory - Dark bandwidth and some ways to get rid of it through near memory gather/scatter (slides)
- A Vision For Destruction Of Post-Moore Disruption - prospective talk with my views on architecture from a biological evolution perspective. (slides)
- MEMSYS 2017 - New and Cool Memory Technologies - A panel discussion on how to use new memory technologies, challenges, and in general what is new and cool in the field. (slides)
- CppNow2017 - RaftLib - RaftLib Tutorial 2, slides and video.
- CppNow2017 - FIFO Optimization Tips, details and video.
- Future of Memory Technology for Exascale and Beyond IV - Panel discussion on the future of memory technology for exascale. My take: instead of focusing just on new technologies, lets focus on systems. (session page)
- CppNow2016 - RaftLib tutorial, slides and video (official one available via YouTube channel).
Organized Workshops and Conferences
In the News
- Shedding Light on Dark Bandwidth, published 14 September 2017 in The Next Platform (nextplatform.com).
- Momentum is Building for ARM in HPC, published 30 June 2017 in The Next Platform (nextplatform.com).
Research Publications
Full list of papers and patents available at Google Scholar.
-
The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload
Introduces the NUCD architecture, a lightweight accelerator offload mechanism tightly coupled to a host core for fine-grain tasks; demonstrates performance gains over conventional driver-based offload.
Asri, M., Dunham, C., Rusitoru, R., Gerstlauer, A., & Beard, J. (2020). The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload.
Show BibTeX
@article{adrgb20, title = {The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload}, author = {Asri, Mochamad and Dunham, Curtis and Rusitoru, Roxana and Gerstlauer, Andreas and Beard, Jonathan}, publisher = {28th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing}, series = {PDP2020}, year = {2020}, month = mar }
-
Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data
Presents multi-spectral reuse distance to infer spatial locality from temporal traces across multiple granularities; uses Earth Mover’s Distance to quantify shifts and guide page-sizing decisions.
Cabrera, A. M., Chamberlain, R. D., & Beard, J. C. (2019, September). Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data.
Show BibTeX
@inproceedings{ccb19, title = {Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data}, author = {Cabrera, Anthony M. and Chamberlain, Roger D. and Beard, Jonathan C.}, publisher = {The IEEE High Performance Extreme Computing Conference 2019}, series = {HPEC2019}, year = {2019}, month = sep, slides = {../slides/HPEC-2019-anthony.pdf} }
-
SPiDRE: Accelerating Sparse Memory Access Patterns
Explores SPiDRE hardware to accelerate sparse and irregular memory access via near-memory data reorganization, improving bandwidth utilization and performance.
Barredo, A., Beard, J. C., & Moretó, M. (2019, September). SPiDRE: Accelerating Sparse Memory Access Patterns.
Show BibTeX
@inproceedings{bbm19, title = {SPiDRE: Accelerating Sparse Memory Access Patterns}, author = {Barredo, Adri\'an and Beard, Jonathan C. and Moret\'o, Miquel}, publisher = {28th International Conference on Parallel Architectures and Compilation Techniques (PACT)}, series = {PACT2019}, year = {2019}, month = sep }
-
This Architecture Tastes Like Microarchitecture
Revisits early RISC ideas and argues for ISAs that specify less microarchitectural detail, contrasting multiple ISA design philosophies and their implications for hardware/software co-design.
Dunham, C., & Beard, J. C. (2018). This Architecture Tastes Like Microarchitecture. The 2nd Workshop on Pioneering Processor Paradigms.
Show BibTeX
@online{db18a, title = {This Architecture Tastes Like Microarchitecture}, author = {Dunham, Curtis and Beard, Jonathan C}, publisher = {The 2nd Workshop on Pioneering Processor Paradigms}, series = {WP3}, year = {2018} }
-
The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time
Introduces SPiDRE, a near-memory programmable data-reduction/rearrangement engine for sparse workloads, including a programmer interface and evaluation on representative applications.
Beard, J. C. (2017, October). The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time. Proceedings of the Second International Symposium on Memory Systems.
Show BibTeX
@inproceedings{b17a, title = {The Sparse Data Reduction Engine (SPiDRE): Chopping Sparse Data One Byte at a Time}, author = {Beard, Jonathan C}, booktitle = {Proceedings of the Second International Symposium on Memory Systems}, year = {2017}, month = oct, organization = {ACM}, slides = {../slides/memsys2017_SPiDRE_Beard.pdf} }
-
Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore
Defines ’dark bandwidth’ as wasted data movement driven by memory interfaces and virtual- memory abstractions, and argues for system-level approaches that reduce data movement in post-Moore systems.
Beard, J. C., & Randall, J. (2017). Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore. Proc. High Performance Computing Post-Moore (HCPM’17).
Show BibTeX
@article{br17a, title = {Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore}, author = {Beard, Jonathan C and Randall, Joshua}, booktitle = {Proc. High Performance Computing Post-Moore (HCPM'17)}, series = {Lecture Notes in Computer Science}, year = {2017}, month = jun, slides = {../slides/beard_hcpm2017.pdf} }
-
RaftLib: A C++ template library for high performance stream parallel processing
Journal version of RaftLib detailing a C++ template library that enables streaming graph optimizations, dynamic queue tuning, automatic parallelization, and low-overhead monitoring for legacy C/C++ code.
Beard, J. C., Li, P., & Chamberlain, R. D. (2016). RaftLib: A C++ template library for high performance stream parallel processing. International Journal of High Performance Computing Applications. https://doi.org/https://doi.org/10.1177/1094342016672542
Show BibTeX
@article{blc16, author = {Beard, Jonathan C and Li, Peng and Chamberlain, Roger D}, title = {RaftLib: A C++ template library for high performance stream parallel processing}, year = {2016}, doi = {https://doi.org/10.1177/1094342016672542}, journal = {International Journal of High Performance Computing Applications} }
-
Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines
Extends SVM-based model selection to online settings, determining when M/M/1 and M/D/1 models apply to stream-processing queues; validated across multiple hardware and software platforms.
Beard, J. C., Epstein, C., & Chamberlain, R. D. (2015). Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines. Proceedings of Euro-Par 2015 Parallel Processing, 82-93.
Show BibTeX
@inproceedings{bec15b, title = {Online Automated Reliability Classification of Queueing Models for Streaming Processing using Support Vector Machines}, author = {Beard, Jonathan C. and Epstein, Cooper and Chamberlain, Roger D.}, booktitle = {Proceedings of Euro-Par 2015 Parallel Processing}, year = {2015}, month = aug, pages = {82-93}, publisher = {Springer} }
-
Run Time Approximation of Non-blocking Service Rates for Streaming Systems
Introduces runtime estimation of kernel service rates for streaming systems to enable continuous retuning and avoid reliance on steady-state assumptions.
Beard, J. C., & Chamberlain, R. D. (2015). Run Time Approximation of Non-blocking Service Rates for Streaming Systems. Proceedings of the 17th IEEE International Conference on High Performance and Communications, 792-797. https://doi.org/https://doi.org/10.1109/HPCC-CSS-ICESS.2015.64
Show BibTeX
@inproceedings{bc15f, author = {Beard, Jonathan C. and Chamberlain, Roger D.}, booktitle = {Proceedings of the 17th IEEE International Conference on High Performance and Communications}, title = {Run Time Approximation of Non-blocking Service Rates for Streaming Systems}, year = {2015}, pages = {792-797}, month = aug, publisher = {IEEE}, doi = {https://doi.org/10.1109/HPCC-CSS-ICESS.2015.64}, slides = {../slides/hpcc2015_public.pdf} }
-
Online Modeling and Tuning of Parallel Stream Processing Systems
Thesis on online modeling and tuning for parallel stream processing: presents RaftLib and techniques for runtime modeling/optimization to reduce tuning cost and improve portability across heterogeneous systems.
Beard, J. C. (2015). Online Modeling and Tuning of Parallel Stream Processing Systems [PhD thesis]. Department of Computer Science and Engineering, Washington University in St. Louis.
Show BibTeX
@phdthesis{beardthesis, author = {Beard, Jonathan C.}, title = {Online Modeling and Tuning of Parallel Stream Processing Systems}, school = {Department of Computer Science and Engineering, Washington University in St. Louis}, month = aug, year = {2015}, link = {https://www.jonathanbeard.io/pdf/beard-thesis.pdf} }
-
Run Time Approximation of Non-blocking Service Rates for Streaming Systems
Presents an online method to estimate non-blocking service rates of stream kernels, enabling dynamic queueing/network-flow optimization under changing workloads; implemented in RaftLib and validated on benchmarks.
Beard, J. C., & Chamberlain, R. D. (2015). Run Time Approximation of Non-blocking Service Rates for Streaming Systems. ArXiv Preprint ArXiv:1504.00591v2.
Show BibTeX
@article{bc15b, title = {Run Time Approximation of Non-blocking Service Rates for Streaming Systems}, author = {Beard, Jonathan C. and Chamberlain, Roger D.}, journal = {arXiv preprint arXiv:1504.00591v2}, year = {2015}, month = apr, link = {https://arxiv.org/pdf/1504.00591v2} }
-
Deadlock-free Buffer Configuration for Stream Computing
Shows that output-buffer sizing in stream graphs can induce deadlock; provides necessary and sufficient conditions for deadlock-free configurations plus algorithms to detect and fix unsafe buffer settings.
Li, P., Beard, J. C., & Buhler, J. (2015). Deadlock-free Buffer Configuration for Stream Computing. Proceedings of Programming Models and Applications on Multicores and Manycores, 164-169.
Show BibTeX
@inproceedings{lbb15, author = {Li, Peng and Beard, Jonathan C. and Buhler, Jeremy}, title = {Deadlock-free Buffer Configuration for Stream Computing}, publisher = {ACM}, address = {New York, NY, USA}, year = {2015}, month = feb, series = {PMAM 2015}, booktitle = {Proceedings of Programming Models and Applications on Multicores and Manycores}, pages = {164-169} }
-
RaftLib: A C++ template library for high performance stream parallel processing
Introduces RaftLib, a C++ template library for stream/data-flow processing that brings streaming optimizations to legacy C/C++ code, including dynamic queue optimization, automatic parallelization, and low-overhead monitoring.
Beard, J. C., Li, P., & Chamberlain, R. D. (2015). RaftLib: A C++ template library for high performance stream parallel processing. Proceedings of Programming Models and Applications on Multicores and Manycores, 96-105.
Show BibTeX
@inproceedings{blc15, author = {Beard, Jonathan C. and Li, Peng and Chamberlain, Roger D.}, title = {RaftLib: A {C++} template library for high performance stream parallel processing}, publisher = {ACM}, address = {New York, NY, USA}, year = {2015}, month = feb, series = {PMAM 2015}, booktitle = {Proceedings of Programming Models and Applications on Multicores and Manycores}, pages = {96-105} }
-
Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines
Uses SVMs to automate reliability classification of queueing models for streaming systems, reducing reliance on expert tuning; demonstrated on microbenchmarks spanning diverse queueing conditions.
Beard, J. C., Epstein, C., & Chamberlain, R. D. (2015). Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines. Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 325-328.
Show BibTeX
@inproceedings{bec15, author = {Beard, Jonathan C. and Epstein, Cooper and Chamberlain, Roger D.}, title = {Automated Reliability Classification of Queueing Models for Streaming Computation using Support Vector Machines}, month = jan, year = {2015}, booktitle = {Proceedings of the 6th ACM/SPEC international conference on Performance engineering}, series = {ICPE 2015}, publisher = {ACM}, address = {New York, NY, USA}, pages = {325-328} }
-
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Proposes a truncated Levy distribution to characterize best-case execution-time variation in multicore systems and a parameterization method based on measurable system parameters; evaluated under Linux CFS.
Beard, J. C., & Chamberlain, R. D. (2014). Use of a Levy Distribution for Modeling Best Case Execution Time Variation. In A. Horvath & K. Wolter (Eds.), Computer Performance Engineering (Vol. 8721, pp. 74-88). Springer International Publishing.
Show BibTeX
@incollection{bc14a, year = {2014}, month = sep, isbn = {978-3-319-10884-1}, booktitle = {Computer Performance Engineering}, volume = {8721}, series = {Lecture Notes in Computer Science}, editor = {Horvath, A. and Wolter, K.}, title = {Use of a {Levy} Distribution for Modeling Best Case Execution Time Variation}, publisher = {Springer International Publishing}, author = {Beard, Jonathan C. and Chamberlain, Roger D.}, pages = {74-88}, slides = {../slides/EPEW2014.pdf} }
-
Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications
Describes a simple, practical approach to modeling throughput and buffering for streaming applications deployed on heterogeneous processors, emphasizing usability amid hidden architectural complexity.
Beard, J. C., & Chamberlain, R. D. (2013). Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications. Proc. of IEEE Int’l Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 345-349.
Show BibTeX
@inproceedings{bc13b, author = {Beard, Jonathan C. and Chamberlain, Roger D.}, title = {Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications}, booktitle = {Proc. of IEEE Int’l Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems}, month = aug, year = {2013}, pages = {345-349} }
-
Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse Architectures
Presents a lightweight analytic model (hybrid max-flow and decomposed queueing) to estimate throughput and buffering needs for streaming applications on heterogeneous hardware, validated with real and synthetic benchmarks.
Beard, J. C., & Chamberlain, R. D. (2013). Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse Architectures. Proc. of Int’l Symp. on Performance Analysis of Systems and Software, 138-139.
Show BibTeX
@inproceedings{bc13a, author = {Beard, Jonathan C. and Chamberlain, Roger D.}, title = {Use of Simple Analytic Performance Models of Streaming Data Applications Deployed on Diverse Architectures}, booktitle = {Proc. of Int’l Symp. on Performance Analysis of Systems and Software}, month = apr, year = {2013}, pages = {138-139} }
-
Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing Platforms
Introduces TimeTrial, a low-impact performance monitor for streaming applications spanning heterogeneous platforms (multicore CPUs and FPGAs). It measures cross-platform communication queues using a mix of direct measurement and modeling, validated on microbenchmarks and a Monte Carlo Laplace application.
Lancaster, J. M., Wingbermuehle, J. G., Beard, J. C., & Chamberlain, R. D. (2011). Crossing Boundaries in TimeTrial: Monitoring Communications Across Architecturally Diverse Computing Platforms. Proc. of Ninth IEEE/IFIP Int’l Conf. on Embedded and Ubiquitous Computing, 280-287.
Show BibTeX
@inproceedings{lancaster11b, author = {Lancaster, Joseph M. and Wingbermuehle, Joseph G. and Beard, Jonathan C. and Chamberlain, Roger D.}, title = {Crossing Boundaries in {TimeTrial}: Monitoring Communications Across Architecturally Diverse Computing Platforms}, booktitle = {Proc. of Ninth IEEE/IFIP Int’l Conf. on Embedded and Ubiquitous Computing}, month = oct, year = {2011}, pages = {280-287} }