Keynotes

Sankaran Menon

SoC and System-level Debug - Challenges, Innovations and Solutions!

Speaker: Sankaran Menon, Ph.D.
Principal Engineer & DFX Architect
Intel


System/platform-level debug is extremely important not only for manufacturing debug but also for mission mode debug. After the Systems-on-Chip (SoCs) are manufactured and assembled on system boards, with the installation of OS (Operating System), SW (Software) and FW (Firmware), any system-level error or BSOD (Blue Screen of Death) seen in laptops/systems are hard to debug. Debugging such issues require opening up the system, which can be very time-consuming. Closed-chassis debug techniques help alleviate opening up the system for debug, thus saving tremendous amount of debug time and money, as well as helps improve TTM (Time to Market). USB Type-C® receptacle has become the most popular choice by most OEMs/ODMs as a system debug interface for sending out debug information. This keynote will start with an overview of SoC debug and proceed to describe the importance of closed-chassis debug at the system/platform level using the ubiquitous USB Type-C® receptacle. The talk will cover the debug architecture framework, challenges, innovations, and solutions for capturing Hardware, Software and Firmware traces from the SoCs/platforms/system as well as use of the interface for In-Field Debug (IFD) and for Silicon Lifecycle Management (SLM) purposes.

Sriram Sankar

Silent Data Corruptions at Hyperscale

Speaker: Sriram Sankar
Director of Engineering
Meta


Silent Data Corruptions are extremely hard to diagnose in a production fleet and cause significant impact at scale. This talk will cover Meta's experience tackling this emerging challenge at our scale. We find that Silent Data Corruptions are not 1 in a million occurrence as previously thought by the industry, they happen far too frequently (1 in thousands). This is a huge order of magnitude difference that industry should take immediate action on. This will need cross-functional work across many areas including testing, verification, design, manufacturing, fleet detection, and software approaches for resiliency. SDC work is foundational for computational accuracy, and the talk will also be a call to action for industry and researchers to address this critical challenge together.

Carlos Tokunaga

Circuits and Technology Advancements for Resiliency and Reliability

Speaker: Carlos Tokunaga
Intel Corporation


Next-generation SoCs for the Zetta-Scale computing era will be developed with increased integration of our compute, memory and communication systems in optimized and complex packaging solutions. We will explore the challenges and opportunities in circuits and technology to enable resilient and reliable circuits and systems.