Photo by Mike Hindle on Unsplash
Artificial Intelligence and big data have the potential to revolutionize communities by unlocking insights and efficiencies that were previously hard to even imagine. This powerful pairing can, however, prove a nightmare from a security standpoint. The future of minimizing these risks starts with hardware and travels up to the cloud.
That's one of the main takeaways from a recent keynote at Microsoft Build from Arun Gupta, and Graham Bury, Principal PM Manager - Azure Confidential Computing, Microsoft. In the 45-minute talk, the pair dive into data diversity, use cases and what's next.
Arun Gupta onstage at Build.
Securing the Stack
Developers often overlook important concerns when building applications. Their typical stack includes operating systems, languages, libraries, frameworks, and microservices. However, data is fundamental to all of this, and protecting it is crucial, Gupta says. There are privacy and security standards, regulations, and restrictions on data access, which can limit the potential of AI. To truly leverage data and build applications securely and with confidence, developers must consider these concerns.
In this "era of AI," Drury says, it's not just about the data anymore, but also the models and components that make up AI solutions. There are discussions about protecting intellectual property (IP) for these models and the trained weights used. These conversations become more complex when trying to determine who needs protection from data or models, such as cloud providers to meet regulations or the model developer because regulations forbid data from being exposed to them. These problems are unique to AI and did not exist when we were solely focused on protecting data.
Diversity of Data
AI doesn’t just require mountains of data - but data from different regions and jurisdictions, Gupta says. This data diversity matters because it improves the accuracy of the AI model. Of course, data from different regions can also make it more challenging to gather and incorporate into the model. As the field evolves, so too will the methods used to collect and utilize diverse data.
Many solutions are initially trained and developed using synthetic data, Drury says. This raises concerns about the performance of the model, especially in situations such as healthcare or autonomous cars -- where people's lives are at stake. Most of us, he says, probably wouldn't be in an autonomous vehicle that's only been trained on synthetic data, not on real-world road conditions.
Confidential Computing
Confidential computing enhances data security and privacy. Intel's vision for AI is to democratize accessibility.
"Our vast portfolio of hardware, including CPUs, GPUs, XPUs, and FPGAs, is available for cloud service providers, data centers, clients, and edge," Gupta says. "We provide libraries and software optimizations for popular frameworks including PyTorch*, TensorFlow*, scikit-learn*, and Onnx*.” The company also contributes to over 300 open source projects to keep customers optimized and benefiting from its silicon's best features. "To help our customers get started easily, we launch toolkits and blueprints that can be used with PyTorch and other toolkits to create reference architectures."
Intel currently offers around 35 AI toolkits to help you get started. The Intel® Geti™ Platform, launched in September 2022, lets you create computer vision models easily. "Data is critical for providing constant insights and making AI processes more effective," Gupta adds.
Confidential Computing came out a few years ago at Microsoft and the team there quickly realized it required industry-wide collaboration, Drury says.
Intel and Microsoft joined forces to establish the Confidential Computing Consortium (CCC), which includes cloud providers, hardware vendors, and solution providers including ARM*, Google* and Huawei*. The group's goal is to establish industry standards for terminology and technology and educate organizations on protecting data in use, both in the cloud and at the edge. As defined by the group: "Confidential Computing is the protection of data in use by performing computation in a hardware-based, attested Trusted Execution Environment (TEE)."
The relationship between computing and open source is inseparable because transparency is crucial in understanding the workings of the entire stack, particularly in the cloud where most developers begin their work, Gupta says. With all these organizations working together, the philosophy of openness truly complements the project.
Encryption and Transport Layer Security (TLS) have been available for years, but processing data remained a gap until we introduced new hardware, Drury says. Intel has partnered with Microsoft to deploy this hardware in Azure*, baking encryption keys into the CPU for added security. You can also bring your own customer-managed keys to unlock data during processing. This revolutionizes cloud computing, making it more secure for sensitive data usually kept on-prem. Azure has a high bar for data protection, building on the standard set by the Confidential Computing Consortium.
By working in an open ecosystem, customers aren't limited to a specific cloud solution, Gupta points out. There's no lock-in, meaning they have the flexibility to operate using confidential computing protocols in Azure and extend that to multiple clouds. Customers can also bring their own hardware and set up the attestation mechanisms on their own, providing greater control and customization.
Use Case: Federated Learning
Federated learning is one way to use confidential computing. OpenFL , is a Python* library for federated learning that enables collaboration on machine learning projects without sharing sensitive or private data. In federated learning, the model moves to meet the data rather than the data moving to meet the model. OpenFL follows a classic data science pipeline, with two new workflow components: “Collaborator” and “Aggregator.” Developed and hosted by Intel, the Linux Foundation and AI & Data Foundation Technical Advisory Council recently accepted OpenFL as an incubation project. For a real-world example, read this case study where Intel Labs collaborated with 71 international healthcare and research institutions to train AI models to identify brain tumors. (In just a few minutes with a few commands, you can also try it out with this demo.)
Use Case: Nasa
Frontier Development Lab (FDL) researchers conducted a landmark astronaut health study with Intel AI Mentors to better understand the physiological effects of radiation exposure on astronauts. Using Intel AI tech, FDL created a first-of-its-kind algorithm to identify the biomarkers of cancer progression using a combination of mouse and human radiation exposure data. “We saw as we were doing this with the FDL that the model accuracy went up by 27% as opposed to just doing it on their own silo,” Gupta says.
CoCo and FL: How They Work Together
OpenFL utilizes the Confidential Consortium Framework (CCF), developed by Microsoft Research and built into some of the primitives with Confidential Computing. This framework provides governance with distributed trust for OpenFL, allowing for management of participants and applications while also ensuring integrity and enforcing plans and experimentation. Additionally, OpenFL runs in the Intel® Software Guard Extensions (Intel® SGX) Confidential Computing cloud environment, providing granular confidentiality and building consensus.
SGX is a security feature that helps protect data in use through application isolation technology. It creates secure enclaves that are fully encrypted, allowing the application to communicate directly with the CPU to encrypt and decrypt data while bypassing any intermediary layers. This protects the data from being examined or used by other code, including potentially malicious operating systems or hypervisors. SGX is available on Intel architecture and has been available in Azure for the past few years, with customers and partners using it for specialized workloads such as cryptocurrencies, secure key management, and the transfer of major digital assets in the financial space.
Drury adds that Microsoft adopts various frameworks, including the Confidential Consortium Framework (CCF), to ensure a tamper-proof record of transactions. They've also built services using this framework, including the Azure Confidential Ledger and the SQL Ledger in insecure enclaves, both of which leverage the same SGX audit trail. This allows for tamper-proof audit logs and secure storage of information within their databases.
That’s where the Gramine project comes in. It’s a library OS, similar to a unikernel, that got started in Intel Labs and is now part of the CCC. Compared to running a complete guest OS in a virtual machine (VM), Gramine is much lighter weight. You can take your application as is and wrap it a Docker container, while the Gramine library is included to provide all the expected SGX capabilities. Developers can submit a manifest or allow Gramine to produce one automatically, which configures the application environment isolation policies. Intel has already optimized a variety of well-known open source frameworks, already available in the Azure Marketplace.
What’s Next
Intel® Trust Domain Extensions (Intel® TDX) introduces a new architectural element, trust domains, that allows for hardware-isolated virtual machines for a lift-and-shift type environment. TDX can isolate the entire virtual machine, protecting it from a broad range of software and hardware attacks, including from privileged operating systems. Both technologies offer options in terms of granularity and the ability to utilize hardware keys for memory encryption. They also both rely on zero-trust constructs and the ability to verify the environment without a station.
Attestation is a crucial aspect of Intel's zero trust approach, backed up by SGX and TDX technolgy. However, the trust reliability of the Trusted Execution Environment (TEE) relies heavily on this attestation capability. Attesting to the data's security enables a third party to confirm its origin, context, and current state to enhance its verifiability. Intel's Project Amber, a Trust-as-a-Service, can be integrated with other services and is cloud agnostic. It’s currently available in Azure in limited preview, try it out with this signup link.
All of these Confidential Computing offerings aim to provide flexibility and choice for customers, providing granular applications with SGX, overall VMS with Intel TDX, and the ability to isolate containers. Both speakers say they noticed a growing interest in containers from customers, which is why Intel and Microsoft are partnering within an open source ecosystem to contribute towards a new project called Kata Confidential Containers. As TDX becomes more widespread in Azure, more container offerings will also be available.
Catch the whole 45-minute session.