|
Dr. Michael McCool Intel, U.S.A
Embedded and Mobile Software Development for Intel SoCs Software is a critical component of any system, and software development for SoCs can be challenging, especially if there are high reliability or performance requirements. In this talk I will survey the wide range of functionality available for software development using Intel SoCs in both the embedded and mobile contexts. Intel® System Studio, for instance, provides the Intel® C++ Compiler, including the Intel® Cilk™ Plus parallel model for scaling performance using multiple cores and vector units; Intel® VTune™ Amplifier XE for performance and power analysis; the Intel® JTAG Debugger, providing low overhead event tracing and logging for source level debug of UEFI firmware, bootloaders, OS kernels, and drivers; support in the GDB* debugger for fast application level defect analysis for increased system stability, application level instruction trace, and data race detection; the Intel® Inspector for Systems, a dynamic and static analyzer to identify memory and threading errors; and libraries such as Intel® Integrated Performance Primitives and the Intel® Math Kernel Library for accelerating application programs. Intel also provides outstanding support for Android development. For cross-platform mobile application development, the Intel® XDK supports portable HTML5 application development (including access to sensors and other mobile-specific functionalities across multiple operating systems), and the Beacon Mountain package includes the Intel® HAXM Android emulator, among other useful tools and libraries, for native Android development. The Intel® Perceptual Computing SDK supports additional capabilities related to interpretation of sensor data, including camera input, and the Intel® Media SDK provides access to fixed-function hardware units such as motion estimation. Finally, Intel processors support a huge variety of open-source software, including Linux (such as the Yocto distribution for embedded devices), libraries such as OpenCV for vision applications, scripting languages such as Python and Javascript, web technologies, and a range of compilers. For example, Cilk™ Plus implementations are also available or under development in both gcc and clang/LLVM, supporting the same parallel and vector computing capabilities as the Intel compiler.
Biography: Michael McCool is Intel Principal Engineer. He has degrees in Computer Engineering (University of Waterloo, BASc) and Computer Science (University of Toronto, M.Sc. and PhD.) with specializations in mathematics (BASc) and biomedical engineering (MSc) as well as computer graphics and parallel computing (MSc, PhD). He has research and application experience in the areas of data mining, computer graphics (specifically sampling, rasterization, path rendering, texture hardware, antialiasing, shading, illumination, function approximation, compression, and visualization), medical imaging, signal and image processing, financial analysis, and parallel languages and programming platforms. In order to commercialize research work into many-core computing platforms done while he was an Associate Professor at the University of Waterloo, in 2004 he co-founded RapidMind, which in 2009 was acquired by Intel. Currently he is a software architect with Intel working on parallel programming languages, applications, and mobile computing. In addition to his university teaching, he has presented numerous tutorials at Eurographics, SIGGRAPH, and SC on graphics and/or parallel computing, and has co-authored three books. The most recent book, Structured Parallel Programming, was co-authored with James Reinders and Arch Robison. It presents a pattern-based approach to parallel programming using a large number of examples in Intel Cilk Plus and Intel Threading Building Blocks.
|
|
|
Dr. Jose Flich Universidad Politécnica de Valencia, Spain
Many-core System Designs through Effective Routing Support and Reconfigurability Current technology still pushes for higher number of nodes in future chips. Chip Multiprocessor systems (CMPs) and Multiprocessor System-on-Chip systems are targeted with the many-core approach where tens and hundreds of cores are expected to be supported. In such configurations, the network inside the chip plays a central role, shifting the system from a computation-centric approach to a communication-centric approach. In parallel, new technology and architectural challenges threaten CMP/MPSoC designs. Process variation, manufacturing defects, and power dissipation problems set limiting barriers to efficiently scale to hundreds. In this talk, the problems with NoC design for CMPs and MPSoCs/MCSoCs will be identified and possible solutions will be addressed. The solutions will have its center of gravity in the routing algorithm and its implementation and reconfiguration capability, which is key to cope with the incoming challenges. Topology alternatives and key latency improvement strategies, linked with coherence protocols will be tackled as well.
Biography: Jose Flich got his PhD in 2001 in Computer Engineering. He is an Associate Professor at UPV where he leads the research activities related to NoCs. He published over 100 conference and journal papers, and has served in different conference program committees (ISCA, NOCS, ICPP, IPDPS, HiPC, CAC, CASS, ICPADS, ISCC), as program chair (INA-OCMC, CAC) and track co-chair (EUROPAR). He has collaborated with different Institutions (Ferrara, Catania, Jonkoping, USC) and companies (AMD, Intel, Sun). Current research activities focus routing, coherency protocols and congestion management within NoCs. He has co-invented different routing strategies, reconfiguration and congestion control mechanisms, some of them with high recognition (RECN and LBDR for on-chip networks). He is a member of the Hipeac-2 NoE. He is coeditor of the book “Designing Network-on-Chip Architectures in the Nanoscale Era”, and is the coordinator of the P7 NaNoC project.
|
|
|
Dr. Hideyuki Kawashima University of Tsukuba, Japan
Taming Big Data Streams The amount of data streams produced by sensing devices or network monitoring system is increasing. To process them in low latency, stream processing engines (SPE) have been studied. This talk introduces the overview of stream data processing first, and then it presents speaker’s recent work including transactional stream processing, outlier detection technique over packet streams, a secure data processing framework with encryption, and an acceleration system with FPGA.
Biography: Hideyuki Kawashima received Ph.D. from Science for Open and Environmental Systems Graduate School of Keio University, Japan. He was a research associate at Department of Science and Engineering, Keio University from 2005 to 2007. From 2007 to 2011, he was an assistant professor at both Graduate School of Systems and Information Engineering and Center for Computational Sciences, University of Tsukuba, Japan. From 2011, he is an assistant professor at Faculty of Information, Systems and Engineering, University of Tsukuba.
|
|
|
Dr. Jiang Xu Hong Kong University of Science and Technology, Hong Kong SAR
Network-on-Chip Benchmarks Based on Real MPSoC Applications By integrating multiple processing units on a single chip, multiprocessor system-on-chip (MPSoC) can provide higher performance per energy and lower cost per function to applications with burgeoning complexity. The performance of MPSoC/MCSoC is determined not only by the performance of its processing units, but also by how efficiently they collaborate with one another. It is the MPSoC's communication architecture which determines the collaboration efficiency. The on-chip communication architectures of MPSoC are moving from traditional buses and ad-hoc interconnects to more sophisticated network-on-chip (NoC), and have become an active research area in both industry and academic communities. As benchmark programs for microprocessor architectures, NoC traffic patterns are essential tools for NoC performance assessments and architecture explorations. The fidelity of NoC traffic patterns has profound influence on NoC studies. Ideally, realistic NoC traffic patterns should capture communication behaviors as well as their temporal and spatial dependencies in real applications. And in addition to communications, they should offer insights into computation tasks and memory usages for comprehensive NoC-based MPSoC research and development. This talk will introduce an industry-academic joint effort to systematically develop realistic NoC benchmarks through multidisciplinary collaborations on real MPSoC applications.
Biography: Dr. Xu received Ph.D. degree from Princeton University. From 2001 to 2002, he worked at Bell Labs, NJ, as a Research Associate. He was a Research Associate at NEC Laboratories America, NJ, from 2003 to 2005. He joined a startup company, Sandbridge Technologies, NY, from 2005 to 2007 and developed as well as implemented two generations of NoC-based ultra-low power multiprocessor systems-on-chip for mobile platforms. In 2007, Dr. Xu joined Hong Kong University of Science and Technology, and established the Mobile Computing System Lab and Xilinx-HKUST Joint Lab. He currently serves as an Associate Editor of ACM Transactions on Embedded Computing Systems and IEEE Transactions on Very Large Scale Integrated Systems. He is an ACM Distinguished Speaker and a Distinguished Visitor of IEEE Computer Society. He served on the organizing committees and technical program committees of many international conferences, including ICCAD, CASES, ICCD, ISVLSI, VLSI, EMSOFT, CODES+ISSS, NOCS, ASP-DAC, etc. Dr. Xu authored or coauthored more than 60 book chapters and papers in peer-reviewed journals and international conferences. He and his students received Best Paper Award from IEEE Computer Society Annual Symposium on VLSI in 2009, and Best Poster Award from AMD Technical Forum and Exhibition in 2010. He coauthored a book titled Algorithms, Architecture and System-on-Chip Design for Wireless Applications (Cambridge University Press). His research areas include network-on-chip, multiprocessor system-on-chip, embedded system, computer architecture, low-power VLSI design, and HW/SW codesign
|
|
|
Dr. Ran Ginosar EE & CS, Technion, Israel
The Plural Architecture: Shared Memory Many-cores with Hardware Scheduling The Plural many-core architecture combines hundreds of small cores, many shared memory banks, a hardware scheduler, and two custom active networks-on-chip: cores-to-memories and cores-to-scheduler. A theoretical model (almost) justifies increasing the number of cores while making them smaller and slower, maximizing performance-to-power ratio. Several benchmark simulations are demonstrated, showing close to linear speedup and high performance-to-power ratio. A de-synchronized PRAM-like task-based non-CSP and non-locking programming model for shared memory enables fine-grain parallelism.
Biography: Prof. Ran Ginosar received BSc from the Technion and PhD from Princeton University. He has conducted research at Bell Laboratories, at the University of Utah and at Intel Research Laboratories in Oregon, USA. He is member of the faculty of EE and CS departments at the Technion, and heads the VLSI Systems Research Center. He has also co-founded several start-up companies in the area of VLSI and parallel processing. His research interests focus on VLSI, asynchronous logic and parallel processing architectures.
|
|
|
Dr. Peter A. Beerel University of Southern California, U.S.A.
Practical Advances and Applications of Asynchronous Design As we continue to push for lower-power and lower supply voltages, there is a growing need for resilient circuits which can accommodate increasing variability in the characteristics of both transistors and wires. Asynchronous circuits have long been an intriguing potential solution to address this issue due to their natural ability to adapt to variations. However, the asynchronous circuit overhead and the lack of CAD tools have been stumbling blocks for their wide spread adoption. This talk reviews some of the styles of asynchronous design and discusses their potential advantages and challenges for both network on chip and core logic applications. We then review one promising asynchronous design flow called Proteus which was commercialized out of USC research via TimeLess Design Automation and used on Intel's latest 10G Ethernet Switch Chip. This flow enables design from high-level specifications using a combination of standard simulation, synthesis, and physical design tools with a small set of specific algorithms for performance and power optimization of asynchronous circuits.
Biography: Peter Beerel received his B.S.E. degree in Electrical Engineering from Princeton University, and his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1991 and 1994, respectively. He joined the Department of Electrical Engineering-Systems at the University of Southern California’s Viterbi School of Engineering in 1994, where he is currently an Associate Professor and Faculty Director of Innovation and Entrepreneurship in Engineering. In May of 2008, he co-founded TimeLess Design Automation with one of his Ph.D. students, Dr. Georgios Dimou to commercialize an asynchronous ASIC flow called Proteus. They sold the company in July of 2010 to Fulcrum Microsystems. Fulcrum Microsystems was acquired by Intel in 2011 and became part of its Networking Division at which he also works as Chief Scientist of Technology Development. Dr. Beerel’s has been a member of the technical program committee for the International Symposium on Advanced Research in Asynchronous Circuits and Systems since 1997, was program co-chair for ASYNC'98, was general co-chair for ASYNC’07, and general chair for ASYNC’13. He received a National Science Foundation CAREER Award, was co-winner of the Charles E. Molnar in ASYNC'97, and was a co-recipient of the best paper award in ASYNC'99. He was also the 2008 recipient of the IEEE Region 6 Outstanding Engineer Award for significantly advancing the application of asynchronous circuits to modern VLSI chips.
|
|