Dependable Electronics systems in space, ESA's view
The domain of space avionic systems is changing extremely rapidly, compared to other technical domains in space-faring industry, under the pressure of an intense competition, the continuous emergence of new markets and players, the need for cost reduction, as well as an increased obsolescence rate of components and processes. This rapidly changing landscape is as well opening a large amount of opportunities for the space avionic systems: the new high-performance processors architectures and silicon processes, which offer the possibility to integrate different functions until now implemented on several boards either in a single chip (SoC), or in application-specific standard products (ASSP) or in new large FPGAs are allowing multi-fold gains in performances and miniaturization for electronic systems. As an example, future missions, such as active debris removal for cleaning up the low Earth orbit environment, will rely on novel high-performance avionics to support advanced image processing algorithms with substantial workloads. However, when designing new avionics architectures, constraints relating to the use of electronics in space present great challenges, further exacerbated by the need for significantly faster processing compared to conventional space-grade central processing units. With the long-term goal of designing high performance embedded computers for space, a study and tradeoff analysis of a diverse set of computing platforms and architectures (i.e., central processing units, multicore digital signal processors, graphics processing units, and field-programmable gate arrays, AI accelerators) will be presented.
Keynote Talks @ DFT 2019Prof. Onur Mutlu is a Professor of Computer Science at ETH Zurich. He is also a faculty member at Carnegie Mellon University, where he previously held the Strecker Early Career Professorship. His current broader research interests are in computer architecture, systems, hardware security, and bioinformatics. A variety of techniques he, along with his group and collaborators, has invented over the years have influenced industry and have been employed in commercial microprocessors and memory/storage systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. He started the Computer Architecture Group at Microsoft Research (2006-2009), and held various product and research positions at Intel Corporation, Advanced Micro Devices, VMware, and Google. He received the inaugural IEEE Computer Society Young Computer Architect Award, the inaugural Intel Early Career Faculty Award, US National Science Foundation CAREER Award, Carnegie Mellon University Ladd Research Award, faculty partnership awards from various companies, and a healthy number of best paper or "Top Pick" paper recognitions at various computer systems, architecture, and hardware security venues. He is an ACM Fellow "for contributions to computer architecture research, especially in memory systems", IEEE Fellow for "contributions to computer architecture research and practice", and an elected member of the Academy of Europe (Academia Europaea). For more information, please see his webpage at https://people.inf.ethz.ch/omutlu/
RowHammer and Beyond
We will discuss the RowHammer problem in DRAM, which is a prime (and likely the first) example of how a circuit-level failure mechanism in Dynamic Random Access Memory (DRAM) can cause a practical and widespread system security vulnerability. RowHammer is the phenomenon that repeatedly accessing a row in a modern DRAM chip predictably causes errors in physically-adjacent rows. It is caused by a hardware failure mechanism called read disturb errors, which is a manifestation of circuit-level cell-to-cell interference in a scaled memory technology. Building on our initial fundamental work that appeared at ISCA 2014, Google Project Zero demonstrated that this hardware phenomenon can be exploited by user-level programs to gain kernel privileges. Many other recent works demonstrated other attacks exploiting RowHammer, including remote takeover of a server vulnerable to RowHammer and takeover of a mobile device by a malicious user-level application that requires no permissions. We will analyze the root causes of the problem and examine solution directions. We will also discuss what other problems may be lurking in DRAM and other types of memory, e.g., NAND flash and Phase Change Memory, which can potentially threaten the foundations of reliable and secure systems, as the memory technologies scale to higher densities. We conclude by describing and advocating a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.
Dr. Riccardo Mariani (Intel Fellow and Chief Functional Safety Technologist) is widely recognized as an expert in functional safety and integrated circuit reliability. In his current role as chief functional safety technologist at Intel Corporation, he oversees strategies and technologies for IoT applications that require functional safety, high reliability and performance, such as autonomous driving, transportation and industrial systems. Riccardo Mariani has been recently nominated 2019 VP of IEEE Computer Society Standardisation Activities. Mariani spent the bulk of his career as CTO of Yogitech, an industry leader in functional safety technologies. Before co-founding the Italian company in 2000, he was technical director at Aurelia Microelettronica, where his responsibilities included leading high-reliability topics in projects with CERN in Geneva. A prolific author and respected inventor in the functional safety field, Mariani has contributed to multiple industry standards efforts throughout his career, including leading the ISO 26262-11 part specific to semiconductors. He has also won the SGS-Thomson Award and the Enrico Denoth Award for his engineering achievements. He holds a bachelor’s degree in electronic engineering and a Ph.D. in microelectronics from the University of Pisa in Italy.
Challenges in AI/ML for safety critical systems
AI/ML/DL is used with various critical roles in dependable smart machines. As a consequence, Functional safety as also safety of the intended functionality need to be carefully considered. The talk will provide examples on the related challenges, at HW, SW and algorithm levels, and from specification, design, verification and validation point of views. It will be highlighted what is currently covered by existing (ISO 26262, IEC 61508) or upcoming (ISO 21448) standards and which gaps are still to be fully understood. The talk will also give an overview of ongoing and future initiatives on safety critical AI.
Prof. Dr. Muhammad Shafique (Technische Universität Wien (TU Wien) is a full professor of Computer Architecture and Robust Energy-Efficient Technologies (CARE-Tech.) at the Institute of Computer Engineering, Faculty of Informatics, Vienna University of Technology (TU Wien) since Oct. 2016. He received his Ph.D. in Computer Science from Karlsruhe Institute of Technology (KIT), Germany in Jan.2011. Before, he was with Streaming Networks Pvt. Ltd. (Islamabad office) where he was involved in research and development of video coding systems several years. Dr. Shafique has demonstrated success in leading team-projects, meeting deadlines for demonstrations, motivating team members to peak performance levels, and completion of independent challenging tasks. His experience is corroborated by strong technical knowledge and an educational record (throughout Gold Medalist). He also possesses an in-depth understanding of various video coding standards. His research interests are in computer architecture, power-/energy-efficient systems, robust computing, hardware security, Brain-Inspired computing trends like Neuromorphic and Approximate Computing, hardware and system-level design for Machine Learning and AI, emerging technologies & nanosystems, FPGAs, MPSoCs, and embedded systems. His research has a special focus on cross-layer modeling, design, and optimization of computing and memory systems, as well as their deployment in use cases from Internet-of-Things (IoT), Cyber-Physical Systems (CPS), and ICT for Development (ICT4D) domains. Dr. Shafique has delivered several Keynotes, Invited Talks, and Tutorials. He has also organized many special sessions at premier venues (like DAC, ICCAD, DATE, IOLTS, and ESWeek) and served as the Guest Editor for IEEE Design and Test Magazine and IEEE Transactions on Sustainable Computing. He has served on the PC Chair, Track Chair, and PC member of several prestigious IEEE/ACM conferences. Dr. Shafique received the prestigious 2015 ACM/SIGDA Outstanding New Faculty Award, six gold medals in his educational career, and several best paper awards and nominations at prestigious conferences (like DATE, CODES+ISSS, DAC and ICCAD), Best Master Thesis Award, DAC'14 Designer Track Best Poster Award, IEEE Transactions of Computer "Feature Paper of the Month" Awards, and Best Lecturer Award. His work on aging management for GPUs featured in the Research Highlights of Nature Electronics, February 2018 issue. Dr. Shafique holds one US patent and has (co-)authored 6 Books, 10+ Book Chapters, and over 200 papers in premier journals and conferences. He is a senior member of the IEEE and IEEE Signal Processing Society (SPS), and a member of the ACM, SIGARCH, SIGDA, SIGBED, and HIPEAC.
From Cross-Layer Resilience for On-Chip Systems to Robust Machine Learning
In today’s smart era, billions of heterogeneous devices, ranging from embedded to high-end computing machines, are getting increasingly integrated to realize complex abundant-data systems that need to process and classify a massive amount of data reliably under tight performance and energy constraints. Computing devices fabricated with nano-scale transistors are susceptible to a wide range of robustness threats like soft errors, thermal stresses, process variations, and diverse aging effects (like Negative/Positive Bias Temperature Instability, Hot Carrier Injections, and Time-Dependent Dielectric Breakdown). These threats jeopardize the correct execution of applications, leading to functional and timing errors, which can pose catastrophic risks (like malfunctions of healthcare-equipment and automotive crashes) and enormous economic losses in financial and banking systems. Therefore, robustness is an extremely important design criteria for computing systems deployed in smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT), which are crucial for the infrastructures of individuals, organizations, industries, and nations bearing significant safety-related, social and economic impacts. Tremendous amount of research effort has been invested at individual system layers. However, considering the declining reliability cost trends, designing a highly robust system would require engaging multiple system layers across the hardware and software stacks in order to achieve cost-effective resilience. This talk will provide an overview of important robustness issues, prominent state-of-the-art techniques, and various hardware-software modeling and optimization techniques developed by my team. A key focus will be on bridging the gap between hardware and software to achieve accurate reliability models for the higher system layers at different levels of granularity. This provides a foundation to develop and employ diverse robustness optimizations at different system layers. Afterwards, this talk will discuss the dark silicon problem, and how can it be leveraged to explore new challenges and opportunities for design and management of thermally-constrained computing systems to improve quality metrics (reliability, performance, etc.) within peak power and thermal constraints. Towards the end, this talk will shed light on the new robustness challenges and opportunities for the emerging machine learning systems.