Empowering Enterprises: OPEA, AI, and the Future of Storage

In this episode of Open at Intel, host Katherine Druckman spoke with MinIO Engineer Daniel Valdivia about his participation at KubeCon and his work in Kubernetes integrations and AI initiatives. They discuss the significance of object storage standardization via the Open Platform for Enterprise AI (OPEA), emphasizing the flexibility and scalability of MinIO's offerings. Daniel highlights MinIO's contributions to open source projects like PyTorch and Spark and shares insights on new hardware technologies like PCIe Gen 5. Daniel also announces the launch of MinIO's new AI store, designed to empower enterprises to efficiently manage exascale infrastructure and AI pipelines. Enjoy this transcript of their conversation.  
 

“I can also see how the rest of the OPEA strategy will pan out for everyone else in the industry, because it's going to be a cross-validated solution for multiple vendors. And, it's an open standard. It's in the name, right?”

— Daniel Valdivia, Engineer, MinIO 

 

Katherine Druckman: Hey Daniel, thank you for joining me at KubeCon. I know everyone's really busy, so I really appreciate it. 

Daniel Valdivia: No, my pleasure. 
 

Katherine Druckman: Awesome. 
 

Daniel Valdivia: I love coming to KubeCon. It's such a lively place to hang out. 

Meet Daniel Valdivia: Engineer at Minio

Katherine Druckman: It is. It really is. So many people, and so many like-minded people. It's great. So, tell us who you are and what you do. What you're doing here at KubeCon. 

Daniel Valdivia: My name is Daniel Valdivia, and I'm an engineer here at MinIO. I'm responsible for applications, Kubernetes integrations, and the AI initiatives. Because one of my responsibilities falls under the Kubernetes integrations, it was so important, at least for me, to come to KubeCon, because I get to see what's new with Kubernetes mainly. How other companies are actually embracing what ideas they're actually building on top of. And also, presenting our offering, which is deeply integrated to Kubernetes. 

The Role of OPEA in Enterprise AI

Katherine Druckman: You're also involved in a project that Intel is heavily involved in, which is the Open Platform for Enterprise AI. I wonder if you could talk a little bit about that and why you participate and how. 

Daniel Valdivia: That's a great question, because we live in an object store world. This means everyone needs storage and the industry is realizing very quickly that they need to standardize an object store. And we're the most popular storage on the planet, not because we are only fast or built for large scale, but because we are the only one that can be run and it can be run on any hardware. We are not other solutions that are like, "I'm going to sell you an appliance." Or, "I'm going to sell you a hosted service." We only offer licenses to our software. And then, our customers, what they do is they go and procure the hardware that they need. They get the drives that they need that match the use case, and then they build very large storage. 

So, that's what the enterprise likes. And, being part of OPEA, it's very important to us, because when you see companies building…in the modern use cases, people are now either having large scale big data, right, for cybersecurity, or for training large scale models. OPEA is all about that, building a consistent stack that everyone can just grab, and take, and make sure that it will work for their enterprise. No surprises. 

Community Participation and Contributions

Katherine Druckman: Tell us a little bit about how you participate in the community, specifically with the OPEA project. 

Daniel Valdivia: In the OPEA project, we’re particularly focused on providing a rock-solid storage foundation that other solutions can build upon. 

But, it also comes with the standardization of AI hardware as well. It's very important that we are compatible in two ways. One is with the software stack. When people are building machine learning algorithms or AI algorithms, we need to make sure that their stack is working with object stores, because that's our API. We are 100% compatible. We definitely want to make sure that we're 100% compatible.  

That's one way we participate with the communities, ensuring the compatibility is there, not only with our stack, but with other libraries like PyTorch. You'll see us actually sending contributions to PyTorch. 

And then, it comes with the hardware as well. Right now, there's a champion, a leader in the industry that's NVIDIA. Everyone's trying to buy NVIDIA hardware, but there's other things coming out of Intel for example, and they're coming up with the Gaudi strategy, and we are pretty interested in making it work. For example, we're adding capabilities around RDMA, right, over Rocky. We can push data at very high speed into these new modern accelerators that are actually just here in the market. This is how we are trying to balance both software and hardware involvement. 

MinIO's Unique Object Storage Solutions

Katherine Druckman: As an end user or developer, how does a platform like OPEA enhance my experience with your technology? And how does it help you engage with the developer community? 

Daniel Valdivia: I can talk from a perspective of search, but I can also see how the rest of the OPEA strategy will pan out for everyone else in the industry, because it's going to be a cross-validated solution for multiple vendors. And what we want is it's an open standard. It's in the name. So, the enterprise wants to buy something and they want to know that they're not buying the wrong thing. So, by getting a solution that's OPEA validated, they're rest assured it's going to be interoperated with the rest of the stack. And this is probably the right strategy for me to embrace AI. If I don't know what to do, I just go and embrace OPEA and I'll be safe, right? I won't end up buying, let's say, an appliance, they're offering some storage, or a protocol that no one supports. Now, I'm locked in and I just wasted a lot of money. But if I go with an object store solution like MinIO that's guaranteed to work. And it's also not a vendor locking you in, so you're pretty much guaranteed for success. 

Katherine Druckman: Very interesting. It seems to me that so many people now have been tasked with either building from scratch, from the ground up, a generative AI application or adding generative AI capabilities into existing applications. And I wonder how you see this platform helping those people get up to speed. 

Daniel Valdivia: I mean, since the AI boom started… I'm going to take a small detour and answer your question. 

Katherine Druckman: Yeah, yeah, please. 

Daniel Valdivia: But this is the AI store. Boom, the implosion happened two and a half years ago. Suddenly, everyone's like, "I need to get GPUs. I need to upgrade my networking." And then, you start realizing all these other components that need to be in place for you to run even a basic machine learning pipeline. You need to find the right orchestrator, the right framework.  

Now, there’s so much information to consider behind every decision, even down to figuring out what's the right way to do networking between two servers. What is the right server to get? For anyone just getting into this, the OPEA standard pretty much says, "Okay, this is a family of products that are actually being cross validated, they will guide you across to the right path. You can't go wrong." That's the advantage. 

Other people can rely on this, this is how it actually facilitates. People are trying to embrace our setup infrastructure for AI, not only for serving it, but also for training it, so that it makes their lives easier. 

Open Source Contributions and Challenges

Katherine Druckman: Awesome. So, you mentioned PyTorch and we talked a little bit about open source in general. What are the other open source projects that you contribute to as MinIO? 

Daniel Valdivia:  PyTorch right now is one of the biggest ones when it comes to the deep learning and AI community, and it has emerged as a champion of frameworks for AI. We worked heavily contributing to TensorFlow and Qflow a couple of years ago, but we've seen that cooling off. Right now, we're the only ones to do PyTorch. So we're trying to be very aggressive into making sure that that's well-supported. 

Katherine Druckman: Okay. So how do you decide where to put your resources? 

Daniel Valdivia: The community drives that, because we see, for example, I think a year ago, or we were also throwing some things at Spark, and building adapters for Spark, because the community that's running Spark, on top of object storage, they were facing some issues. Spark was built traditionally on HDFS. And now that the Hadoop system collapsed, everyone's moving out of HDFS and they're moving into object store. So now, there's some assumptions that Spark made that assume I'm running on HDFS, but now, I'm actually running on an object store world. And, there's some updates that need to be made to Spark. So, these are some of the things that the committee brought to our attention and we're like, "Okay..." They were asking us, "Is there a way for MinIO to fix this?" And we swear, well, the right way is to actually fix it from the Spark side. 

This is how we actually react. Sometimes it's better to go and make sure. Or if we see, for example, a new product coming emerging, like Airflow, and we see that Airflow has support for S-Tree, but it doesn't have support for custom S-Tree endpoints, we go and try to contribute that part, because we know it's important for companies or users to have options of storage. 

Katherine Druckman: Yeah. So, just a basic question, I'm building an application, maybe it's generative AI. 

Daniel Valdivia: Mm-hmm. 

Katherine Druckman: And, there are so many options at all different stages, and we want there to be. Again, it behooves vendors, and developers, and end users to avoid the vendor lock-in. It encourages innovation, encourages ease of use. 

Daniel Valdivia: Yeah. 

Katherine Druckman: But how do you differentiate, and how do you encourage people to make those decisions at various steps in the development process? 

Daniel Valdivia: So, I mean, particularly, in the case of MinIO, we were lucky, because we were riding the big data wave: all this craziness about big data, and capturing large amounts of data, and then running analytics on top of it. That's what actually unlocked the modern age of AI. 

Katherine Druckman: Yeah. 

Daniel Valdivia: Because now we have data. 

Katherine Druckman: Yeah, yeah. 

Daniel Valdivia: And compute this data- 

Katherine Druckman: Now what do we do with all of that? 

Daniel Valdivia:... Yeah, what do we do with this? So now, we want to start building amazing things on top of it. So, preparing out of that, we pretty much found ourselves in a unique position because we were the stack to actually host these vast amounts of data that were required to train these large amounts of AI algorithms. And we also focus on simplicity, we focus on performance, and scale. And all these three things were key to building these modern generative models or large language models. So, we were at the right place at the right time, in that regard, with our particular offering. I think that pretty much covers the base of the question, unless you think I missed something from your initial question. 

Future of AI and Hardware Innovations

Katherine Druckman: Tell me this, if we talk again in a year, what do you hope to be able to tell me? 

Daniel Valdivia: I’m really excited about the transformation of the hardware that we are seeing right now. PCI gen 5 is here. We see Intel Xeon landing, bringing more PCI lanes to the scene. The PCOS and eCOS evolution is also very interesting. Everyone's asking all these new questions coming along like, "Okay, now I can build workloads that are for efficiency cores. And there are workloads for performance cores." And that really excites me. I want to see in a year that we crack the problem and say, "Okay, we can take all the efficiency cores, very power efficient, and roll the storage workloads on that. And then, we can take the performance cores and start running all the AI workloads on top of that. And do a perfectly good balance." I want to see the embrace of PCI gen 5, and networking all over the world. 

Katherine Druckman: And how do you see that impacting end users? 

Daniel Valdivia: What I like about PCI Gen 5 is that, while it may seem like just a faster bus, it also significantly increases equipment density. One-use and two-use servers with PCI Gen 5 can now be extremely dense in terms of both storage and compute, not just because they have more processors, but because they can also accommodate more GPUs, Tensor Processing Units (TPUs), or HPUs. The expanded capacity of PCI Gen 5 allows for more compact hardware, which benefits everyone by reducing the need for additional hardware and lowering power consumption. However, as AI continues to scale, these space savings will ultimately lead to even more servers and greater computational demands.  

The ongoing technological compaction is enabling unprecedented levels of compute power, and the rate at which compute has multiplied in the last three years is remarkable. Looking ahead, the industry is already working at an exabyte scale, and it will be fascinating to see just how much further this expansion goes in the coming year. 

Katherine Druckman: Yeah. It's overwhelming. 

Daniel Valdivia: I was asking my founder the other day, "When do you think we're going to start seeing if you're a zettabyte guy? Seven years?" That, to me, is insane. 

Katherine Druckman: It's a mind-blowing number. 

Daniel Valdivia: It's a mind-blowing number. Yeah. 

Katherine Druckman: At some level, you really can't even conceive of what it means. 

Daniel Valdivia: If you asked me three years ago, I'd be super happy with my 100 petabyte customers, but now, the exabyte customers are just popping left and I'm like, "Wow." 

Katherine Druckman: Yeah, that's so wild. 

Daniel Valdivia: Exabyte size age, to some extent. But, maybe I'm wrong, maybe it's not seven years, maybe it's three years and we're going to set that scale. And really like, "What?" Right? So I don't know. That's why I'm so excited by this explosion of compute and throughput in the systems. 

Katherine Druckman: Where would you like to see your work have the most impact? 

Daniel Valdivia: I think we're already having a massive impact, in the sense that everyone needs storage. And, we are the most boring component. You're building these amazing solutions… 

Katherine Druckman: Right. It's like, "Let's talk about oxygen."  

Daniel Valdivia: Exactly. But then, we like the impact that we're having. And, we would like to see, for example, we just added all these new AI features into our product, pretty much to enable people building AI solutions to embrace everything properly, like, "Okay, do you need a model registry? Or do you need a place to store your models?" That could be an object store. Or, you could just go about using the AI store to have a proper model registry with versions, and comments, and everything else. 

Same for data sets, or we have for example, object prompting. Being able to say, "Okay, I want to deploy a large language model that actually can interact with my content so I can identify quick things quickly, like, 'I have a file here.' Is there any EPI there, like personal file information? Yes." So, if our customers can see, "Okay, I like MinIO solution. The best part is I can take this and run it. And I can immediately embrace the AIH. I don't have to go to any clock writer, I don't have to go to an external service. I can build this and own this myself. That's what I'll be super happy to see that my product impacted everyone." 

Big Announcement

Katherine Druckman: So, here at the event, you have a big announcement. What can you tell us about that? 

Daniel Valdivia: That was precisely what I was picking in my previous answer, the AI store. 

Katherine Druckman: Yeah. 

Daniel Valdivia: Because pretty much we're taking the object store to the next level, because now when we're seeing these customers running on exascale, we're seeing a unique set of problems that are unique to exascale. So, we were like, "Okay, let's build products to actually empower people to run exascale infrastructure themselves, without much hand-holding or anything." We want to give them a tool that's the AI store. We want to give them tools with observability, encryption address, they're running their own KMS example. We want to give them firewalls so they can actually control security into their data infrastructure. We are going to give them cataloging capabilities so they can perform chargebacks or checks through a simplified API. 

And most importantly, we want to give them the tools, so that if they're building the modern AI pipelines, they have everything they need, not only from storing large amounts of data, but also the cataloging of the data sets, the organization of the models, and even the leverage of models to actually on top of the data. Because, that's ultimately where the value of AI is, is what you do with it, not so much in how much data you have. So, people are already realizing now, I can use AI to scan through large amounts of data. And that application of the AI is where people are finding value, not so much into training the model. 

Katherine Druckman: Well, this is all very exciting. I really appreciate you joining me. I would love to follow up again with any updates. 

Daniel Valdivia: Sure. Anytime. 

Katherine Druckman: Thank you so much. 

Daniel Valdivia: No problem. 

Katherine Druckman: You've been listening to Open at Intel. Be sure to check out more about Intel’s work in the open source community at Open.Intel, on X, or on LinkedIn. We hope you join us again next time to geek out about open source.  

About the Guest 

Daniel Valdivia, Engineer, MinIO 

Daniel Valdivia is an engineer with MinIO where he focuses on Kubernetes, ML/AI and VMware. Prior to joining MinIO, Daniel was the head of machine learning for Espressive. Daniel has held senior application development roles with ServiceNow, Oracle, and Freescale. Daniel holds a bachelor of engineering from Tecnológico de Monterrey, Campus Guadalajara and bachelor of science in computer engineering from Instituto Tecnológico y de Estudios Superiores de Monterrey.

About the Host

Katherine Druckman, Open Source Security Evangelist, Intel  

Katherine Druckman, an Intel open source security evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she's a long-time champion of open source and open standards. She is a software engineer and content creator with over a decade of experience in engineering, content strategy, product management, user experience, and technology evangelism. Find her on LinkedIn