Together, Intel and the U.S. National Science Foundation (NSF) are funding new research at the Massachusetts Institute of Technology (MIT) to automate key aspects of engineering code for scalable, heterogeneous systems using machine programming (MP). The research aims to advance the foundations for machine programming to unlock software development productivity.
The scope of research will center around MIT’s novel system for Scalable Machine Programming (ScaMP), which offers a new approach to building modern applications with strong performance and scalability requirements. The research is part of NSF’s Principles and Practice of Scalable Systems program (PPoSS), which supports researchers working across multiple disciplines on modern applications, systems and toolchains built on heterogeneous architectures. PPoSS hopes to foster the design and implementation of large-scale systems and applications to improve performance, correctness and accuracy.
“Software developers today are not well-versed in how to map applications onto hardware systems,” said Tim Mattson, senior principal engineer at Intel Labs. “With hardware diversity growing rapidly, however, it is difficult for developers to exploit the full benefits of modern systems. This is a critical problem for the computer industry. The solution is to automate as much of the software development process as possible. We need automated systems that map a programmer’s intention to the features of a specific heterogeneous system. I am excited to work with researchers at MIT to turn this ‘dream of automated software generation’ into reality.”
“When foundational research is pursued in collaboration with industry, the potential for impact is amplified. We are pleased to see Intel partner with the ScaMP team to derive new approaches for reliable and scalable software for modern systems,” said Dilma Da Silva, Division Director for NSF’s Division of Computing and Communications Foundations.
“This research explores the future of computing where computers do your bidding, do it correctly, and do it fast,” said Saman Amarasinghe, Professor in the Department of Electrical Engineering and Computer Science at MIT. “We believe how we program in the future will be drastically different from how it’s done today and are excited to bring together six programming and natural language principal investigators from MIT and researchers from Intel to address this important problem."
Simplifying Programmability
Programming is a cognitively demanding task that requires extensive knowledge of an application domain, programming experience and creativity. It is also notoriously difficult to automate due in part to the challenge of mapping algorithms onto the details of a hardware system. Unfortunately, as system complexity continually increases, demands placed on programmers grow and it is difficult to find enough adequately trained programmers, even as demand is at an all-time high. According to the U.S. Bureau Labor Statistics, employment of software developers in the U.S. is projected to grow 25 percent from 2021 to 2031, much faster than the average for all occupations.
Machine programming holds the promise of re-shaping the way software is developed by replacing expert-crafted algorithms with machine-learned components. One of the core challenges for research on programming tools is reconciling the classic appeal of rigorous correctness reasoning with the versatility of machine learning. MIT’s proposed work addresses this challenge head-on through a novel system for scalable machine programming (ScaMP), which offers a new approach to building modern applications with strong performance and scalability requirements..
The Three Pillars of Machine Programming
Machine programming research is still in its early stages and considered a nascent technology. To create a foundational definition of machine programming for the field, Intel Labs and MIT worked together to draft The Three Pillars of Machine Programming. The work details three technical pillars:
- Intention, which captures what the programmers want the software to do and utilizes advancements in the human-to-computer and computer-to-machine-learning interfaces.
- Invention, which focuses on the creation or refinement of algorithms or core hardware and software building blocks through ML.
- Adaptation, which emphasizes advances in the use of ML-based constructs to autonomously evolve software.
These elements are part of any programming process, but they are usually tangled together in complex ways, making them hard to solve. In machine programming work, the group exploited a separation of concerns mindset and addressed them separately. This “divide and conquer” approach takes a problem once thought too complex to solve and breaks it into three district subproblems that can be solved and then combined into an end-to-end programming system.
Scalable Machine Programming
To address each foundational pillar, ScaMP builds on the MIT researchers’ experience with program synthesis, natural language processing and formal verification. It also pulls from their development of high-performance domain-specific languages to support an interactive and iterative development model that combines high-level specification with fine control over low-level implementation decisions resulting in programs that deliver performance portability.
The ScaMP project can be broken down into four main layers:
- Incremental Multimodal Specification: This module addresses the first pillar of machine programming: intention. It helps the programmer refine a high-level description of the problem to be solved into a precise low-level specification of each of its constituent components. These are then expressed as code using a novel concept of safe stackable smart domain specific languages.
- Safe Stackable Smart Domain-Specific Languages (S3DSL): These DSLs make up the second layer of the system. They are used to express program modules as architecture-independent distributed code through algebraic rewrite rules, proven correct through a formal proof management system called Coq. This produces high-performance code from high-level specifications in a variety of domains ranging from dense array processing, as in image processing and deep learning, graph processing, sparse tensor algebra, and cryptography.
- Correct by Construction Code Generator Generation: Modern high-performance compute engines are built out of heterogeneous components. Leveraging IMS and S3DSL, the team aims to produce compiler backends for multiple heterogeneous architectures that generate highly optimized assembly code, guaranteeing correctness using Coq-proved translation validation.
- Lifetime Monitoring, Learning, and Adaptation: This layer will support the collection of data across the lifetime of all projects developed within the ScaMP ecosystem. This data will support all other layers of the system by helping it learn about how it reacts to different design and implementation decisions.
What's Next
The research teams hope to develop each of these areas into fully functioning end-to-end solutions within the next five years. In addition to the three pillars and the tasks listed above, the team will need to solve additional problems including compiler verification and natural-language processing for software design. Success will be represented by applications that are easier to build. Eventually, this project will move the world closer to a day when anyone can be a programmer. Computers will meet people on human terms and generate code flexibly, improving software-development productivity across society.