ARM Research Summit 2017 Workshop
The ARM Research Summit is an academic summit to discuss future trends and disruptive technologies across all sectors of computing. On the first day of the Summit, ARM Research will host a gem5 workshop to give a brief overview of gem5 for computer engineers who are new to gem5 and dive deeper into some of gem5's more advanced capabilities. The attendees will learn what gem5 can and cannot do, how to use and extend gem5, as well as how to contribute back to gem5.
The ARM Research Summit will take place in Cambridge (UK) over the days of 11-13 September 2017. The gem5 workshop will be a full day event on the 11th September.
- 1 Target Audience
- 2 Registration
- 3 Preliminary Schedule
- 4 Detailed agenda
The primary audience is researchers who are using, or planning to use, gem5 for architecture research.
Prerequisites: Attendees are expected to have a working knowledge of C++, Python, and computer systems.
See the main ARM Research Summit website for details about registration.
The workshop will take place on Monday the 11th September 2017 at Robinson Colleage in Cambridge (UK).
|08.30-10.00||Introduction to gem5|
|10.20-11.05||Trace-driven simulation of multithreaded applications in gem5|
|13.30-14.15||Modeling Cache Coherence with gem5|
|14.15-15.00||A Detailed On-Chip Network Model inside a Full-System Simulator|
|16:25-17.10||Power modelling using gem5|
Introduction to gem5
Trace-driven simulation of multithreaded applications in gem5
The gem5 modular simulator provides a rich set of CPU models which permits balancing simulation speed and accuracy. The growing interest in using gem5 for design-space exploration however requires higher simulation speeds so as to enable scalability analysis with systems comprising tens to hundreds of cores. One relevant approach for enabling significant speedups lies in using trace-driven simulation, in which CPU cores are abstracted away thereby enabling to refocus simulation effort on memory/interconnect subsystems which play a key role on performance. This talk describes some of the work carried out on the Mont-Blanc european projects on trace-driven simulation and discusses the related challenges for multicore architectures in which trace injection requires to account for the API synchronization of the underlying running application. The ElasticSimMATE tool is presented as an initiative towards combining Elastic Traces and SimMATE so as to enable fast and accurate simulation of multithreaded applications on ARM multicore systems.
Dr Gilles Sassatelli is a CNRS senior scientist at LIRMM, a CNRS-University of Montpellier academic research unit with a staff of over 400. He is vice-head of the microelectronics department and leads a group of 20 researchers working in the area of smart embedded digital systems. He has authored over 200 peer-reviewed papers and has occupied key roles in a number of international conferences. Most of his research is conducted in the frame of international EU-funded projects such as the DreamCloud and Mont-Blanc projects.
Alejandro Nocua received the Ph.D. degree in Microelectronics from the University of Montpellier, France, in 2016. Currently, he is a postdoctoral researcher at the French National Center for Scientific Research (CNRS). His research interests include the analysis of high-performance and energy-efficiency design methodologies. He received his Master degree in Science from the National Institute of Astrophysics, Optics and Electronics (INAOE), Mexico, in 2013. Alejandro was awarded his BS degree in Electronics Engineering from Industrial University of Santander (UIS), Colombia in 2011.
Florent Bruguier received the M.S. and Ph.D. degrees in microelectronics from the University of Montpellier, France, in 2009 and 2012, respectively. From 2012 to 2015, he was a Scientific Assistant with the Montpellier Laboratory of Informatics, Robotics, and Microelectronics, University of Montpellier. Since 2015, he is a Permanent Associate Professor. He has co-authored over 30 publications. His research interests are focused on self-adaptive and secure approaches for embedded systems.
Modeling Cache Coherence with gem5
Correctly implementing cache coherence protocols is hard and these implementation details can affect the system's performance. Therefore, it is important to robustly model the detailed cache coherence implementation. The popular computer architecture simulator gem5 uses Ruby as its cache coherence model providing higher fidelity cache coherence modeling than many other simulators.
In this talk, I will give a brief overview of Ruby, including SLICC: the domain-specific language Ruby uses to specify cache protocols. I will show the extreme flexibility of this model and details of a simple cache coherence protocol. After this talk, you will be able to dive in and begin writing your own coherence protocols!
Jason Lowe-Power is an Assistant Professor at University of California, Davis in the Computer Science department. Jason's research focuses on increasing the energy efficiency and performance of end-to-end applications like analytic database operations used by Amazon, Google, Target, etc. One important aspect of this research is adding hardware mechanisms to systems that enable all programmers to use emerging hardware accelerators like GPUs. Additionally, Jason is a leader of the open-source architectural simulator, gem5, used by over 1500 academic papers. Jason received his PhD from University of Wisconsin-Madison in Summer 2017. He was awarded the Wisconsin Distinguished Graduate Fellowship Cisco Computer Sciences Award in 2014 and 2015.
A Detailed On-Chip Network Model inside a Full-System Simulator
Compute systems are ubiquitous, with form factors ranging from smartphones at the edge to datacenters in the cloud. Chips in all these systems today comprise 10s to 100s of homogeneous/heterogeneous cores or processing elements. The growing emphasis on parallelism, distributed computing, heterogeneity, and energy-efficiency across all these systems makes the design of the Network-on-Chip (NoC) fabric connecting the cores critical to both high-performance and low power consumption.
It is imperative to model the details of the NoC when architecting and exploring the design-space of a complex many-core system. If ignored, an inaccurate NoC model could lead to over-design or under-design due to incorrect trade-off choices, causing performance losses at runtime. To this end, we have designed and integrated a detailed on-chip network model called Garnet inside the gem5 (www.gem5.org) full-system architectural simulator which is being used extensively by both industry and academia. Together with Garnet, gem5 provides plug-and-play models of cores, caches, cache coherence protocols, NoC, memory controller, and DRAM, with varying levels of details, enabling computer architects and designers to trade-off simulation speed and accuracy.
In this talk, we will first introduce the basic building blocks of NoCs and present the state-of-the-art used in chips today. We will then present Garnet, and demonstrate how it faithfully models the state-of-the-art, while also offering immense flexibility in modifying various parts of the microarchitecture to serve the needs of both homogeneous many-cores and heterogeneous accelerator-based systems of the future via case studies and code-snippets. Finally, we will demonstrate how Garnet works within the entire gem5 ecosystem.
Tushar Krishna is an Assistant Professor in the Schools of ECE and CS at Georgia Tech. He received a Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology in 2014. Prior to that he received a M.S.E from Princeton University in 2009, and a B.Tech from the Indian Institute of Technology (IIT) Delhi in 2007, both in Electrical Engineering.
Before joining Georgia Tech in 2015, Dr. Krishna was a post-doctoral researcher in the VSSAD Group at Intel, Massachusetts, and then at the Singapore-MIT Alliance for Research and Technology at MIT.
Dr. Krishna's research interests are in computer architecture, interconnection networks, networks-on-chip, deep learning accelerators, and FPGAs.