Breaking Down AI: Small Models, Big Impacts

In this episode, Open at Intel host Katherine Druckman spoke with Joshua Alphonse of PremAI about the evolving AI landscape. They spoke about the utility of small language models for domain specific tasks and their potential in various industries like finance and healthcare. They also talked about the importance of democratizing AI and where it's all going. Enjoy this transcript of their conversation.

“RAG really, again, is another thing that really just makes AI even more accessible to people overall, just like how the small language models do. So those two together in tandem are just like a match made in heaven.”

— Joshua Alphonse, Head of Product, PremAI

Katherine Druckman: Hey Joshua, thank you for joining me. We are here at All Things Open. I reached out to you because I thought your talk looked really interesting among other things. I wonder if you could just tell us why you're here.

Joshua's Background and Experience

Joshua Alphonse: Yeah, absolutely. I was a speaker last year at All Things Open as well. That was when I worked at ByteDance. I was a staff developer advocate for our open source program office. Last year my talk was on one of the open source projects that were inside of our group called ByConity. I talked a lot about databases and how we manage over 50 different open source projects that were there.

Today I'm talking about Babit Multimedia Framework and how we're using GPU acceleration to help with multimedia. Babit is a really interesting open source project that a lot of people don't know about and honestly, they should. Even though I don't work at ByteDance anymore, the interoperability of this open source project is great. It's basically like an upgrade of FFmpeg and it helps you create video processing pipelines, but then you can add GPU acceleration because at that time we did a collaboration with NVIDIA and TensorRT and so forth.

And this is a way that we use at ByteDance to process over two billion videos on TikTok and CapCut daily. That's how you got all those funny effects and be able to get all these instant videos and live streams right there. And I think it's great for companies that are working in streaming or wherever in multimedia. It's a very straightforward framework and I'm glad I get to speak about it.

Katherine Druckman: Awesome.

Joshua Alphonse: Yeah.

Katherine Druckman: There was something else that you're kind of excited about and I want to make sure we leave plenty of time for that.

Joshua Alphonse: Absolutely.

Katherine Druckman: But actually, before we get into that, you're a bit of an AV nerd. You just talked about that and I wondered how do you bring that part of your identity into your work? We're talking about making AV better for everybody else.

Joshua Alphonse: Absolutely. Yeah. I think what really got me interested, I've always been a musician my whole life, classically trained in woodwinds. When I was in college and further, I interned as a studio engineer and then I worked my way up, got a full-time job there before I got into the tech industry. And just then from a company like Wix for building websites and then to ByteDance where we're doing social media and a whole bunch of open source stuff, it's just like adjacent worlds right now. And especially with AI and everything happening now, there's this next piece of the application layer, the hardware layer. So these are all things that got me interested because eventually music and reporting and AV is all going to intersect here. That's part of what we're here to discuss, too, with small language models and how this affects things on devices in the AV world as well.

Katherine Druckman: Yeah, I find that people with creative backgrounds, I guess, can bring a lot of interesting perspectives to this field.

Joshua Alphonse: Absolutely.

Katherine Druckman: If you didn't get your start purely in just learning to code, writing code, there's so much that I think that brings to the table. And we're in a period of very interesting change in the tech community because of AI. And as we said, before we hit record, we were talking about a lot of people talking about large models, but you are getting interested in small models. So, let's talk about why that matters.

Current Projects and Innovations

Joshua Alphonse: Yeah, small models are awesome. Before I get to that, the large models have their pros and their cons. Obviously the larger models kind of run the world right now and how we use AI for the most part. And this is something that really got me interested when I started to work at this startup called PremAI.

We focus on creating performance, small language models for domain specific tasks and for domain specific companies that want to work with us. For example, we work with Marvel Studios and we have custom models. We have some small language models as well that we collaborate with them on for some future projects that may come out as well. The small language models are significant because they have a lower amount of parameters.

Let’s consider larger counterparts like Llama 3.2, which has around 405 billion parameters. But then they just released some new models that are one billion parameters, three billion parameters, and we can continue to go smaller. And the great thing about small language models is that you can run them on devices like phones, tablets, automobiles that are on the edge, and even if you want to run in terminal, they run really fast. I saw someone run a one billion parameter model on a Raspberry Pi the other day and it was performing extremely well.

Katherine Druckman: That's amazing on a Raspberry Pi. That's so cool. Let's talk about the applications. When do we really need to have that conversation about model size and what sort of application lends itself to a smaller model?

The Importance of Small Language Models

Joshua Alphonse: Yeah, that's an awesome question. I think it also depends on your setting because we've been experimenting with different hierarchical orchestrations with how we're working with small and large language models. Sometimes we'll have a large language model sit at the top and have the other agents that are smaller work in tandem with the larger model to complete different tasks that they're trained on.

In terms of the applications, you can look at a few different places. Small language models are great for edge devices like I was mentioning. So if you want to run the model on a phone or somewhere that has less GPU power and computational resources, you can do this. If there's some more an application where you need near instant replies and real time, small language models just perform a little bit faster, but they have their pros and cons as well.

A place where we're using small language models is for financial compliance with a company called Grand, they're a financial compliance provider in Europe. And we have custom models for them that we made as well, that are small that can help with some of these compliance questions and being able to solve different problems. Again, like the use case I was talking about with Marvel and other healthcare startups as well, because you can train them on whatever tasks that you need and whatever information. It's a lot easier, it requires less money, less resources, and you can see them across different industries. It is kind of hard to pinpoint which application is the best for a small language model because it's just like the training process and the fine-tuning process is where it really shines.

Open Source and AI Ethics

Katherine Druckman: When you're talking about custom small models and stuff like that, that kind of makes me start to want to go in the direction of openness and the conversation around the open source definition for AI, for example. I wonder, I would guess that you're pretty invested in this conversation, right?

Joshua Alphonse: Yeah.

Katherine Druckman: What, in your mind, does open mean for AI models and what should and shouldn't be open in your opinion? The community has spoken quite a bit, but I'd be curious to hear from you.

Joshua Alphonse: Yeah. We see companies like OpenAI, of course. I think they're doing an excellent job of releasing models of different sizes. Sometimes some information you don't really know too much about, but we're in this process now of this democratization of AI models where smaller models are, I think honestly should be open in this regard because of where they could be placed. I feel like they have more accessibility to be used across the world in a bunch of different settings, but I think there's still room.

I've heard some people talk about, oh, fine-tuning is dead, don't really need it. But I feel like that's such a first world problem. And maybe some other places, some other countries and people that don't have as many computational resources can really take advantage of small language models. So seeing where they can intersect and where they could really have a big impact is awesome in that regard. For me, what we're doing with small language models, we're working with different companies and PremAI as a platform offers a bunch of open source models from Llama, Mistral across different various sizes. But the small language models are also where we see our platform shine a bunch, because from there we have autonomous fine-tuning agent that could take care of all your different needs to fine tune the model with ease, and then you can integrate it with any application you need.

And again, you don't need to necessarily use a closed source model to do this, and you can have that extra layer of transparency and security if you an open source model as well. So that's where I really see what open source is, the transparency, the accessibility, and who's be able to be impacted, especially in the fine-tuning process.

Katherine Druckman: Let's talk about various techniques in terms of fine-tuning versus developing, using, for example, RAG techniques where you augment instead of... But you don't retrain. Tell me how those things are related and how they're different.

Joshua Alphonse: Yeah, I mean, RAG, RAG is awesome I think especially if you're trying to relate something to specific data that you have inside of your organization. And we have a bunch of different techniques that we use. We have custom RAG pipelines that we also make on the platform as well. This fine-tuning agent that we have right now is great because it uses various different models as well, large and small. We're using techniques like LoRAs as well in order to fine-tune the models. It doesn't really change much from the larger to the smaller ones. It's faster to fine-tune. We don't necessarily offer anything in training. We have a firm belief that the models that are already pre-trained are already super knowledgeable, have enough background information and general knowledge.

Katherine Druckman: Yes, sure. General knowledge.

Joshua Alphonse: Whereas the fine-tuning process that we have, especially with LoRAs and we've experimented with RLHF before in the past, it really makes a difference on how we can see these small language models perform. Those are some techniques that we're using. And we also are working for our next version in Prem 2.0, which is going to be offering more of these fine-tuning techniques and external data sources as well. RAG and fine-tuning work hand in hand, I believe. I don't think that this one should take over the other. Every industry is going to have its different needs. You're going to have a bunch of documents, and I think RAG really, again, is another thing that really just makes AI even more accessible to people overall, just like how the small language models do. So those two together in tandem are just like a match made in heaven for how we see it, and that's why we continue to work on this process.

Katherine Druckman: Awesome.

Joshua Alphonse: That's what got me interested in working at the startup.

Katherine Druckman: I feel like I should... I try to be hesitant about using too many acronyms that is retrieval augmented generation. For anybody who's not as well versed in this area, what are you excited to see in the next year or so? Especially you're in this position, you've recently started a new thing. I am guessing you're probably in the honeymoon phase and super excited. What are you most excited about? What are you kind of planning out for the next year?

Joshua Alphonse: Yeah, so that's a really good question. Yeah, I'm in a new place now, so it is a bit of a honeymoon phase, but at the same time, the folks that I work with are so...

Katherine Druckman: Making it a great honeymoon?

Joshua Alphonse: They're making it a great honeymoon, to be honest, right?

Katherine Druckman: Exotic location, umbrella drinks.

Future of AI and Exciting Developments

Joshua Alphonse: Yeah. This company, I work with a lot of Italians and folks from Bangalore as well, and it's been a different type of air I'm breathing now, I would say from where I was working last. Not that I didn't like it, it's just that there's a whole bunch of different challenges ahead. The things that I'm looking forward to are the advancements in multi-modality and spatial AI. We've done all this work with contextual understanding and with texts and so forth, but what about giving AI some eyes, being able to see things? I think companies like Waymo are doing a really cool job with that. Obviously, Tesla. But another one that's coming up is World Labs by Fei-Fei, I think it is Fei-Fei Li as well, AI researcher from Stanford. And that company I think is on the verge of doing some really cool things in the spatial space. Of course, Meta and so forth.

I really want to see where that goes, because I think this is the next step towards this advancement of AGI. But this will take some time, especially with all the different energy consumption alternatives that are coming at back, nuclear is coming back and so forth. We have quite some time to go, but I think spatial AI is something I'm really looking forward to seeing in more applications within the next year for sure. I think it's going to have its, I wouldn't say final, but it's going to have its introduction to the general public. We'll see more of it.

Katherine Druckman: Yeah, it's very exciting. There's a lot of exciting things and really positive things happening in the world of AI. There's also a lot of controversy, concern. And I was at an event not too long ago, Grace Hopper, Grace Hopper celebration. Yeah, it was great. It was fantastic. A lot of very interesting leading women, women leading the AI space, which is very cool. But a lot of them brought up the importance of, let's say, having a lot of perspectives at the table, especially when you're talking about something as contentious as the future of AI. And I wondered if you have any thoughts about that.

Joshua Alphonse: Yeah, absolutely. Yeah, this is-

Katherine Druckman: Algorithmic bias comes into the conversation quite a bit, too.

Joshua Alphonse: Oh absolutely.

Katherine Druckman: And then that's very interesting, especially when you're talking about various financial applications or anything like that. I wondered if you had any thoughts…

Joshua Alphonse: Healthcare, all that.

Katherine Druckman: Exactly. Absolutely.

Challenges and Controversies in AI

Joshua Alphonse: It's really crazy. I love this question because I have a few different thoughts about this actually, I'll share with you. Recently in the Bay Area, I've been judging a lot of pitch competitions, hackathons and so forth. I actually did one recently with all female founders. And let me tell you, there's a difference between the female founders and the male founders, not to make a battle of the sexes here. I feel like the female founders are working towards eliminating these biases, and they're really making some super interesting applications that I feel like are catered towards real-world problems.

At the last competition I was at, one of the women there has a startup that’s using AI and different models to detect early breast cancer and other things like this. Maybe I don't have all the visibility, but I find a lot of the male founders are focusing on just building developer tools and figuring out how to get enterprise deals and so forth.

But the women have a different perspective on how they're building tools, and it's admirable. I'm doing another event with women techmakers in New York next week, and I'm really excited to see what else these women are building.

But in terms of controversy, I would say there’s the AI Salon, which is an event where people break out and just have intellectual conversations about AI. I was in the group where we were talking about AI and intellectual property. I asked: “Do we really think the internet is a good representation of who we are now?”

Katherine Druckman: I hope not.

Joshua Alphonse: But this is what AI is being trained on.

Katherine Druckman: Yeah, yeah. Yeah. Garbage in, garbage out.

Joshua Alphonse: That's it. And now we have the synthetic data happening and things are getting trained on that. I just want people to think about this as we are forming the next generation of AI. And of course, the other controversies are open source versus closed source models, having access and the transparency of seeing how these things are built. That's why I find companies, even like Meta, they have their own stuff going on. Are they truly open source?

Katherine Druckman: That's a whole other episode.

Joshua Alphonse: That's a whole other episode that we can dive deep into. But for right now, they're open, they're transparent for the most part.

Katherine Druckman: Open-ish, open adjacent.

Joshua Alphonse: Eventually they'll find a way to commoditize, I guess. But that's the other thing. It's like AI is commoditized and so is your data. There's a lot of biases there. And this is why I'm really interested in the open small language models because it gives more people a chance to eliminate these biases and make things that are specific to who they are and what they're representing and what their backgrounds are as well. I'm excited to see where this is going to go, especially as we're seeing different advancements in small language models on edge devices. We're starting to see things come out with Apple Intelligence soon, and Llama 3.2 can run on a phone. Hopefully these are ways that we can start to eliminate some of these biases by AI knowing who you are and it can represent you.

Katherine Druckman: It's interesting to think in terms of the AI applications or the training data as holding a mirror up to ourselves, the cultural or societal mirror. What do we want to look back at us when we look at that mirror? And so it's kind of important.

Joshua Alphonse: I think it's something to ask ourselves. Obviously, the internet does represent us in some type of way.

Katherine Druckman: Something, represents something for sure.

Joshua Alphonse: But history is written by its victors and it's written by other things, too. These are things I think people should think about as we're integrating AI into our daily lives. We were just talking before we started this interview about how things are so different in the Bay Area compared to the rest of the country when it comes to AI.

Katherine Druckman: Oh, absolutely.

Joshua Alphonse: We have self-driving cars and a whole bunch of stuff happening. The biggest companies in tech in the world are out there. But I just recently traveled to Virginia for a wedding and I went to New York, too. And you just walk around, it's completely different, the conversations are completely different. Obviously, the Bay Area is ahead of things, but I think people also kind of forget how slow things move elsewhere.

Katherine Druckman: Which isn't necessarily a bad thing.

Joshua Alphonse: It's not a bad thing at all. I really don't think so.

Conclusion and Final Thoughts

Katherine Druckman: Humans are flawed, AI is flawed, code is flawed. And sometimes we need to slow down and take a step back and kind of make sure we're on the right path.

Joshua Alphonse: Absolutely. So that's what I'm really looking forward to. Hopefully the small language models will change our perspective and we can cater and make AI that we own that's sovereign to us.

Katherine Druckman: Love it.

Joshua Alphonse: Yeah. So that's what we're looking forward to.

Katherine Druckman: Cool. Cool. Well, thank you so much. Is there anything you wanted me to ask you that I didn't get to?

Joshua Alphonse: Let's see.

Katherine Druckman: Anything you really were excited to talk about?

Joshua Alphonse: I mean, the small language models super excite me, but another thing that I think that's really great to talk about too is just this advancement in agentic AI as well. I think the SLMs are going to play a big part.

Katherine Druckman: Could you define SLM for us real quickly?

Joshua Alphonse: Small language model.

Katherine Druckman: I mean, we said it several times, small language model, but I wanted to make sure.

Joshua Alphonse: Yeah, I think agentic AIs were the next thing. That's all anyone's talking about in San Francisco and the Bay Area right now, agentic, agentic, agentic, and this whole ecosystem that's going to come about, the developer tools, the application layer, where we're going with this. And again, does everybody have the computational resources to run agentic AI with large language models? Might not be the case. I think this is where we're going to see a big piece of this play a part.

And even what we're doing on Prem right now, we're creating different agents and small language models and baking it into our platform. Instead of making an agent framework that you could just chain things together like Lane Graph or CrewAI, we have this already baked into the platform that's already doing some of the machine learning work for you in the background, so you don't have to be a machine learning engineer. This whole new era of agentic AI is super interesting, and I think it's relevant to what we're talking about now.

Katherine Druckman: You've been listening to Open at Intel. Be sure to check out more about Intel’s work in the open source community at Open.Intel, on X, or on LinkedIn. We hope you join us again next time to geek out about open source. 

About the Guest

Joshua Alphonse, Head of Product, PremAI

Joshua Alphonse is head of product at PremAI. Joshua has spent his time empowering developers to create innovative solutions using cutting-edge open-source technologies. Previously, Joshua worked at Wix, leading Product and R&D engagements for their developer relations team, and at Bytedance he successfully created content, tutorials, and curated events for the developer community.

About the Host

Katherine Druckman, Open Source Security Evangelist, Intel

Katherine Druckman, an Intel open source security evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she's a long-time champion of open source and open standards. She is a software engineer and content creator with over a decade of experience in engineering, content strategy, product management, user experience, and technology evangelism. Find her on LinkedIn.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Breaking Down AI: Small Models, Big Impacts

Joshua's Background and Experience

Current Projects and Innovations

The Importance of Small Language Models

Open Source and AI Ethics

Future of AI and Exciting Developments

Challenges and Controversies in AI

Conclusion and Final Thoughts

About the Guest

About the Host

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Breaking Down AI: Small Models, Big Impacts

Joshua's Background and Experience

Current Projects and Innovations

The Importance of Small Language Models

Open Source and AI Ethics

Future of AI and Exciting Developments

Challenges and Controversies in AI

Conclusion and Final Thoughts

About the Guest

About the Host

Product and Performance Information