Numaproj: A Kubernetes-Native Project You Should Know About

author-image

By

There are so many cool and game-changing projects in the open source world, it can be hard to keep up with them all. At KubeCon+CloudNativeCon last fall, we asked Vigith Maurice, a principal engineer at Intuit and cocreator of Numaproj, to tell us about a new project everyone’s talking about. We discuss what Numaproj does, surprising ways people are using it, and what’s next for the community. 

Listen to the full episode here. This conversation has been edited and condensed for brevity and clarity. 

A New Approach to Streaming

Katherine Druckman: Will you tell us about Numaproj? 

Vigith Maurice: We at Intuit created Argo, and we saw a problem when users were trying to do streaming, because it’s all about real-time detection and processing. We wanted to make sure that we open sourced a platform that’s easy to use. Today, streaming is associated mostly with data engineers who use systems like Apache Flink and Apache Spark. We wanted it to be accessible to everyone—application developers, machine learning (ML) engineers, and DevOps.  

We took learnings and feedback from when we developed Argo workflows and applied them to streaming workflows. We built a product that can be used for real-time analytics, ML, and inference training, all based on a streaming concept. You get a message, you process the message, and you can write your processing in any language you like. We’ve received a lot of good feedback. We just released version 1.0 so we’re very new, but our community is growing.  

Any Feedback Is Good Feedback

Katherine Druckman: Where are you in terms of building a community and external contribution?  

Vigith Maurice: Now that we’re 1.0, we already have a few customers using it. We see some community-based pull requests (PRs) coming in, but we’re still in the early stages. I’d say we have 15 or 20 contributors, but any contribution is a great contribution. 

Katherine Druckman: If you could recruit new contributors, what would you love to see them do?  

Vigith Maurice: Mostly, we want users to try it and give us feedback. Streaming is a tricky concept. We started with the motto that you should be able to do this in under five minutes. You don’t have to learn anything new—if you’re already in Kubernetes, you don’t have to learn a specific paradigm or anything. We’ll make sure that you’re in your comfort zone and you can do streaming very fast. At this stage, what we want is people to adopt the project. Try it out and give us good or bad feedback—any feedback is great so that we can evolve and give the community the solutions and features they’re asking for. 

Scaling to Meet Unexpected Use Cases

Katherine Druckman: Can you give us a few example use cases? 

Vigith Maurice: At Intuit, we use it in three ways. One is purely for real-time streaming analytics, such as listening on any streaming topic. For instance, it could be HTTP endpoints to see how many errors per region happen in one minute. So we group by one minute, we see data, and then we sometimes extend the pipeline to use neural networks, like, for example, sliding window and use autoencoders to do anomaly detections on the numbers that flow in. So it’s all about aggregation, associating ML inference to this data, and being able to say whether your system is healthy. That’s one use case that includes both ML and real-time streaming.  

People also use it as a work queue, meaning purely as a streaming workflow. They have some data in Amazon S3, they send it in a BLOB storage, and they get the data and do some processing on it. So the input you get is mostly an event and you process and forward the data and so forth. 

This was the view we had when we built it but interestingly, we see users who, for example, use Numaproj for digital signal processing. Out there, people are using it in very different ways and the system is able to scale and meet their needs. That’s what really makes us happy and proud of what we do. 

The Future of Numaproj

Katherine Druckman: What would success look like a year from now? 

Vigith Maurice: Today we have a good footprint at scale because Intuit uses it at scale and we’re a big company, but we want to see how others are adopting it. There’s a lot needed for streaming evangelization—how many users are really moving into the streaming world? And how many are able to, for example, write a streaming pipeline in their favorite language? I’m hoping by this time next year we have at least 200 customers using it and giving us feedback so that we can improve and more users will adopt it. 

Katherine Druckman: Can you share a few examples of how people are using the project in unexpected ways?  

Vigith Maurice: One of our users, BCubed Engineering, is using it for digital signal processing where the input is a radio frequency. We always thought it was all about data, as in some payload from Kafka. We never thought we could use this platform for digital signal processing. That blew our mind.  

There are others who are using it for navigation map data processing. This was surprising, and they loved it because we’re able to scale the way they needed. They were able to deploy the same specification both on-prem and at the edge. There’s versatility of deployment. We have a good track record of building Argo, so we know how to build a Kubernetes-native product. That helped us build something even better when we built Numaproj. 

To hear more of this conversation and others, subscribe to the Open at Intel podcast: 
 

 

About the Author

Katherine Druckman, Open Source Evangelist, Intel 

Katherine Druckman, an Intel open source evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she’s a longtime champion of open source and open standards. 

Vigith Maurice, Principal Engineer, Intuit 

Vigith Maurice is a cocreator of Numaproj, and principal software engineer for the Intuit Observability and Analytics team in Mountain View, California. One of Vigith’s current day-to-day focus areas is on the various challenges of building scalable data and AIOps solutions for both batch and high-throughput systems. He is pivotal in building the streaming platform to ease data engineering. Previously, he has been a key driver for Intuit’s journey to big data first. He also led various engineering initiatives at Yahoo!