How to Perform Cloud Selection for AI Workloads Artwork

Cables2Clouds

Join Chris and Tim as they delve into the Cloud Networking world! The goal of this podcast is to help Network Engineers with their Cloud journey. Follow us on Twitter @Cables2Clouds | Co-Hosts Twitter Handles: Chris - @bgp_mane | Tim - @juangolbez

All Episodes

Cables2Clouds

How to Perform Cloud Selection for AI Workloads

February 19, 2025 • Cables2Clouds • Episode 53

Send us a text

Sam Zamanian, a principal advisory director at Infotech Research Group with over 20 years of IT and architecture expertise, joins us to dissect the critical decisions surrounding AI infrastructure selection. This episode promises to equip you with the insights needed to navigate the complexities of AI workloads, from data preparation to model training and deployment. With Sam's guidance, you'll uncover the intricacies of optimizing each stage, leveraging high-performance computing and AI accelerators to tailor cloud solutions for any AI challenge.

Choosing between cloud and on-premises infrastructure is a nuanced decision influenced by various factors. We explore the flexibility and potential trade-offs involved, weighing the cloud's convenience against the control offered by on-premises solutions. Whether it's the ease of rapid deployment or the need for customization and predictability, our discussion provides a roadmap for selecting the best infrastructure setup based on specific use cases, performance, cost, and security considerations.

The evolving AI hardware landscape brings both challenges and opportunities, as discussed in our final segment. With supply bottlenecks and market dynamics shifting, new collaborations like AMD and Microsoft's partnership are making waves. We delve into optimization strategies to enhance efficiency, highlighting tools like the DeepSeek model for cost reduction in training large AI models. As the AI industry rapidly evolves, we capture the excitement of future developments and the potential for broader hardware adoption, setting the stage for the next era of technological advancement.

How to connect with Sam:

https://www.linkedin.com/in/sam-z-77028a59/

Purchase Chris and Tim's book on AWS Cloud Networking: https://www.amazon.com/Certified-Advanced-Networking-Certification-certification/dp/1835080839/

Check out the Monthly Cloud Networking News
https://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/

Visit our website and subscribe: https://www.cables2clouds.com/
Follow us on BlueSky: https://bsky.app/profile/cables2clouds.com
Follow us on YouTube: https://www.youtube.com/@cables2clouds/
Follow us on TikTok: https://www.tiktok.com/@cables2clouds
Merch Store: https://store.cables2clouds.com/
Join the Discord Study group: https://artofneteng.com/iaatj

Speaker 1: 0:14

Hello everyone and welcome back to another episode of the Cables to Clouds podcast. My name is Chris Miles at BGP Main on Blue Sky. At BGP Main on Blue Sky, joining me, as always, is my ever happy and gleeful co-host, tim McConaughey at Carpe Diem VPN on Blue Sky as well. Today. We have a very special episode for you today. So we've obviously been talking a lot about AI in this podcast, sometimes when we want to, sometimes when we don't want to. Luckily, today, this is a very specific use case that we do want to talk about. So we have a guest joining us Sam Zamanian Did I get that right?

Speaker 1: 0:54

Sam Zamanian, who is currently a principal advisory director at Infotech Research Group and, oddly enough, our first actual Australian resident that we've had on the podcast, aside from myself. So this is a big day for us. We've had Peter on here a few times, but he's since flew the coop, so he doesn't really get to count. So, yeah, we have an interesting topic. Today. We're going to be talking specifically about cloud and infrastructure selection for your AI workloads and how to appropriately do that. So Sam's been doing a lot of work in this space, so we thought we'd bring him on to have a discussion about it. So, sam, I'll shoot it over to you. Can you tell me a little bit about yourself, who you are and what you do? Thank you for having me.

Speaker 2: 1:43

As you said, my name is Sam Zemanian and I'm a principal director with the infrastructure and operations advisory at Infotech, which is a global research and advisory firm, and prior to that, I've been around the IT industry for 20 plus years.

Speaker 2: 2:00

I come from an application background and then I moved to architecture and that's where I spent most of my recent career at, in various capacities application solution architecture and a little bit of enterprise architecture as well and I was followed by spending a couple of years doing consulting for a global system integrator, and my architecture journey has been predominantly in the financial services space, and cloud has always been one of the major areas of focus there. I've seen lots of moves, shifts and evolution, both from the CSP side, but also as well as the industry. And yeah, that's fast forward to my current role research and advisory and I've had the opportunity to do a bit of research on how to make sense of infrastructure and AI workloads in particular, or vice versa and, yeah, we can roll this topic and I'm hoping that people will take away some insights from this podcast.

Speaker 1: 3:12

Yeah, absolutely Well, thanks for joining us and yeah, let's not waste any time, let's hop right in. So, like I said, we want to talk about cloud and infrastructure selection for specific AI workloads that you want to run, right. So I think we probably need to do a little bit of level setting. So first, let's kind of talk about the lay of the land and get an overview of what AI workloads is, right. Obviously, there's a big commotion right now these days around AI itself. Right, but AI and ML have been around for many years. Right. This has been a very prominent thing, specifically in cloud, for quite some time. So let's kind of understand the lay of the land, like what are AI workloads that you could and could run, either on-prem or within cloud?

Speaker 2: 3:59

AI generally comes down to three stages from a workload perspective, and people already know this.

Speaker 2: 4:07

There is data prep, or data work that involves bringing data from different places in various shapes and sizes, and then the data then will have to be structured, refined, cleansed or cleaned up, making sure that irrelevant entries are removed and that becomes meaningful and relevant to the next step, which is training. That is the foundation of what AI does. The data then is used to develop AI models and the model will learn patterns and behaviors and relationships from data. And then the next step, which I call the production mode, is where the model gets deployed to support the use case or application in question and then uses the patterns that it's learned to make predictions or decisions based on unseen new data. So these are three categories of, or three steps of, workloads, that they come under one and umbrella of our workloads, and that gives us a lot of leverage to be able to, you know, make decision or to drive the conversation around the selection of cloud slash infrastructure for AI and gives us lots of opportunities to optimize for each of these specific categories.

Speaker 3: 5:34

Right, okay, do we? I mean, are we talking about basically, like high performance compute workloads? Is that basically what we're talking about? We're talking about chunking huge amounts of data and doing an ETL extract, transform, load type of operation. Is that what you're talking about? We're talking about chunking huge amounts of data and doing an ETL you know, extract, transform load type of operation. Is that what you're talking about?

Speaker 2: 5:48

Yeah, well, if I take the same breakdown and we can go through them each by each for data step. You know, like you said, there might be HPC involved to structure data, but we need a scalable storage and flexible storage. Flexibility is a key here, because data comes from different shapes and a variety of formats. Scalability is important from a storage point of view because we want to be able to move data around pretty quickly and efficiently, and you need sufficient network bandwidth there as well. For training, hpc becomes critical in the sense that we need a compute powerhouse that supports parallelism. And why parallelism? Because we're dealing with vast and significant amount of data and super complex calculations, and parallelism will make it possible to train data in a timely fashion.

Speaker 3: 6:51

Right, Like hyper-threading basically.

Speaker 2: 6:54

There is the classic HPC and we can get to this, and there is the AI or the next evolution of HPC, which is more appropriate for AI and thinking about AI accelerators we can get to that discussion as well. And for inferencing and that's the interesting part you may be able to get away with general infrastructure, unless you're dealing with real-time use cases such as mission-critical applications. But then the evolution from predictive ML slash AI to generative I don't know if I mentioned that from a different perspective. There is predictive AI, which you know enables people to make decisions, make recommendations and do forecasts, and there is generative AI, which is around generating content, and that the latter has shifted again and raised the bar for both training and inferencing from a from an infrastructure standpoint, and resource consumption has become a critical challenge as you move from classic ai, ml2, generated right AI and also generated there Right?

Speaker 1: 8:03

So I mean, I guess maybe kind of just a dumb question, but in that particular concept of traditional AI ML workloads versus Gen AI, is there any significant difference from an infrastructure perspective? What needs to be accounted for in either one of those?

Speaker 2: 8:21

Yeah, the short answer is there are a lot and just to give you an idea, in the generative AI we talk about large language models a lot, which wasn't the case with classic AI, and it can take between a few months to a year to train a large language model. That's the ballpark. Of course, not every use case can be as aggressive as that, but that's technically the world that we are at, and Microsoft used a supercomputer for OpenAI, microsoft slash, nvidia and that took about a couple of weeks, if I'm not mistaken. That otherwise would have taken a year just to train GPT-3 models. So that's the scale that we're talking about here, gpt-3 models. So that's the scale that we're talking about here.

Speaker 2: 9:10

Also, from inferencing point of view, there is a challenge of how you can load a large language model efficiently into memory, how you can distribute your AI models across different GPUs or across different nodes. It's become a combination of system design and infrastructure in a way that your infrastructure choices will dictate some of your design principles and vice versa. You would have to think about traffic management, for example, with Gen AI, a lot more aggressive than classic AI ML. With large language model. We need to think about caching from an inferencing point of view which wasn't a challenge back in the classic ALM days, and also from infrastructure.

Speaker 2: 9:52

GPUs and other types of accelerators are highly demanded. There is a supply shortage and bottleneck around that which has never been the case before. So the game has been taken to the next level.

Speaker 3: 10:08

I'm just curious, because what we just talked about being the large amount of resources, the infrastructure requirements and everything, do you think and I know it's really early now and I don't know if you've had a chance to look at it all but what do you think about the DeepSeek thing, about what they're saying, about being able to do essentially a lot of the stuff that these LLMs are doing with a fraction of the resources?

Speaker 2: 10:33

Do you think there's any accurate truth to that or is it embellishment from the I always think that optimization has been one of the challenges, but we got better and better at it. There's been this notion that when you deal with AI, you got to go with premium resources and that breakdown that I just described would help you optimize. But there is tons of other opportunities to optimize. Now this deep seek is like two days old at least. We haven't got that much information. But haven't got that much information. But just from a cost perspective, the cost of running a GPT-4 would be around $100 per million tokens Tokens yeah, I saw that and the other one is, I think, $7 per million tokens. That's the difference. We'll have to wait and see how that will come about. But again, there's tons of opportunities to optimize.

Speaker 3: 11:29

Yeah, with the hardware.

Speaker 2: 11:31

Yeah, correct, exactly. Even with cloud, with virtualized hardware, the key part is to break it up and understand what those sweet spots are, and that would be heavily use case driven as opposed to infrastructure driven decision.

Speaker 1: 11:46

Right, okay. So yeah, I think that's a relatively good segue. So let's.

Speaker 1: 11:50

I mean I feel like on this podcast we obviously are very cloud centric, right, and we talk about cloud each and every day and our day jobs as well. So we kind of get lost in the sauce, so to say, and we understand kind of the implications on why it's easy to do things in cloud, especially when you need specific things like resources for workloads like AI. But let's kind of go back to the origin of that and maybe talk about the conversation of doing something within cloud versus doing it on-prem right, about the conversation of doing something within cloud versus doing it on-prem right, and what are some of the common decision factors that would come in.

Speaker 2: 12:28

When you want to choose doing this in the cloud versus on-prem and that's always an open question you can start with cloud, even though you can be skeptical about choosing cloud for your workloads. And before I get to that, I'd like to clarify two things. One is that the factors that drive cloud versus on-prem aren't unique to AI. They still apply to other non-AI, like general enterprise workloads as well. It's just that they may matter more with AI just because the impact of getting them wrong may be larger. And the second part to that is that cloud means different things to different people. If I'm in DevOps, then I think of cloud as a bunch of pre-built, readily available tools that they can just start with. If I'm in the line of businesses space, I'll see the CRM application SaaS-based CRM application as a cloud capability, and that really is. If I'm an architect, I probably think about cloud as a cloud-native architecture. And, for sure, if I'm an infrastructure person, I see cloud as a destination or a hosting mechanism. And AI is no different. It's a cross-functional capabilities. We still need ML apps. We need to think about infrastructure. We need to think about how you embed your AI solution in the application. My well, at least from an AI point of view.

Speaker 2: 13:54

The only reason that I would use cloud or recommend cloud well, let me rephrase it to the only time that I think that cloud would be chosen is the convenience factor. If I'm a beginner in the domain and I want to get something quickly up and running, I will start the cloud because that becomes a low friction option. I don't have to think about HPC procurement, because that tends to be from an on-prem point of view. That tends to be cumbersome, slow and complex. I don't need to deal with supply issues. I don't need to deal with massive upfront investments and, of course, skill set. So if I need to start with tools, then my only choice becomes cloud. That doesn't mean that that's the right answer, but that might be the right answer now.

Speaker 3: 14:45

Right, like with a lot of apps right, where you'll prototype in the cloud, you'll figure it out in the cloud and then you buy the you know some people, especially if it's a 24-7 type of workload, which this could certainly be you'll figure out the app, you know, figure it all out and then bring it back on prem and at that point you know what to procure, you know what the requirements are and everything.

Speaker 2: 15:05

Martin SPLITT yeah, absolutely, and that's the convenience aspect. It's convenient to start with Cloud and then we can figure it out later. Although, from an on-prem point of view, I think there are two reasons that hold people back from choosing Cloud. One is predictability, most from a performance. Well, definitely from a cost perspective, because the pay as you go, nature, really For sure, is going to create a lot of variability surprises from that perspective, but again, only for AI or only for HPC type workloads. Also from a performance point of view as well, because cloud, potentially the multi-tenancy aspects to it, can lead to performance fluctuations, although I haven't heard many stories around that, you know, being the cost of the default of the CSP, but that's technically possible. And the other side, you know go on-prem over cloud is controls.

Speaker 3: 16:01

Of course.

Speaker 2: 16:02

There's a classic one around data and security, but there is also a specific one which is around customization. Cloud would give you little opportunities to do deep customization, and by which I mean there might be cases and this is again heavily use case driven that you need to align your hardware choices back to your workload and that gives you a lot of leverage to optimize from a performance point of view. You don't get that from the cloud, because we're dealing with a pre-selected range of services and virtual devices. So again, control and predictability two choices there, Two reasons why people choose. On Ankur Mootar.

Speaker 1: 16:45

So not to go back to the why people would choose cloud option, but like how much are you seeing, at least in the market today, where people are choosing cloud purely out of availability standpoint? Out of availability standpoint, like maybe they can't get GPUs, you know at a timely fashion that they need to start their process, and how easy is it to switch from one to the other if you've already started down the trajectory right? Maybe you know GPUs are going to come in for a few months? Is it easy to start and move back and forth? Are you seeing many people do that?

Speaker 2: 17:20

So I just your first question. In essence, most people that start with Cloud. That's a common choice, like I said, because of the challenges of skill set on making the right choice when it comes to picking the infrastructure for their workloads, but also the lead time that Right procurement yeah.

Speaker 2: 17:39

Procurement and lead time around the deployments. And to add to your second question, how easy it would be to go back and fix things in the future. That would be heavily coming down to the workload design. In the classic days we had IaaS infrastructure as a service and then PaaS platform as a service and we said that it's easy to move infrastructure as a service workload around but at the same time it's not the right peak to be in the cloud.

Speaker 2: 18:11

Because you still have to manage it, and the same thing applies here, although the only difference is that I don't see many people doing infrastructure as a service for their AI workload. And the more you go towards platform as a service and SaaS, the harder it would be for you to shift around in the future. The reality is that AI also relies on lots of open source frameworks, tools such as TensorFlow's PyTorch. They can run in containers. You can use Kubernetes, so the more you sort of lean towards open source, that would be easy to shift in the future.

Speaker 3: 18:46

I see yeah, okay, yeah, that makes some sense. I've seen like I was doing some work with Andrew Brown on his Gen AI bootcamp that we're doing or that he's doing and I'm helping a little bit with very poorly, but yeah, I mean he was showing me some of the open source tools you know and where you pull a lot of the open source models and stuff like Lama, index and whatnot. Do you end up running these open source tools? Just, I guess, just yeah, like you said, just run them as an app on on Kubernetes. You build a Kubernetes cluster to run all these applications and then, in theory, kubernetes is portable. You should be able to. If you wanted to run your cube on prem, then you could just move it. Essentially.

Speaker 2: 19:30

Yeah, that's the theory, right, and Kubernetes comes with its own. Speaking of Kubernetes, its own tool set on AI. They've got an MLOps toolbox, which I can't remember what, something along the lines of Q, so they come with their own toolbox around AI. Obviously, there is just the choice of MLM frameworks, which most people tend to go to either Keras or PyTorch or TensorFlow. So in theory, you can shift these things around, but for as long as you have the infrastructure to support that. And there is still some levels of locking with this CSP or that CSP that needs to be managed. But from a technical standpoint there is opportunity to do that in the future if you have to do that?

Speaker 1: 20:21

Yeah, I mean I think we've been talking about this on the show for a while now is that the very strong difference between a company that's wanting to leverage, you know, maybe specific things in the cloud to train their own models and do their own things, versus someone that just wants to consume the off the shelf models, that that are relatively general purpose or use case focused? Right, and I can see right there, like if you're consuming, like an AI service or something like that, just to give an example, like if you're consuming AWS, bedrock for you know how you run your applications moving that on-prem is not trivial, right, that's not just a lift and shift, because that is, you don't have that option. So, yeah, I can totally see how the architecture, design and kind of the adoption pattern is really going to matter. Whether or not you can move things around. You build it yourself.

Speaker 3: 21:12

Obviously, you have all the ability in the world to take that workload and move it back on-prem or from on-prem to cloud. But yeah, with managed services like Bedrock, for example I mean they've essentially built the infrastructure for you and then made it available to you like a managed service provider, right? So, just like an MSP, it'd be a lot harder to take your workload and take your data and put that back on-prem.

Speaker 1: 21:34

All right. So yeah, let's move on to the kind of next topic we wanted to talk about, which is exactly what kind of infrastructure are we using for AI workloads? I'm assuming we want to focus more so on the kind of build your own roll, your own type of AI deployments, but what things go into choosing the specific workloads or choosing the specific infrastructure that we use for our AI workloads in cloud Sure and they fall into classic infrastructure domains.

Speaker 2: 22:09

There is storage, there is compute, memory and networking. So I started with compute, and they come down into four main kinds. There is CPUs. They're good at handling general tasks and tasks that have sequential majors, such as I can start my job only after the previous job is finished, and that's by design, and CPUs are great at handling that. And that would apply to, again, most inferencing scenarios, most inferencing scenarios.

Speaker 2: 22:44

Very, very few kinds of deep learning and neural networks that have sequential natures can be done by CPUs, and you would win if you use CPU over GPU from a cost perspective. And the next would be GPUs. They are the most kinds, the most demanded and, again, the main value that GPU brings onto the table is that parallelism they're great at handling parallel tasks. There is TPUs, or Tensor Processing Unit, which was a hardware that was built by Google to support tensor operations, and tensor operations are a subset of or one type of math techniques that are used heavily in deep learning and neural networks. And there is also an NPU, which is neural processing units, same as GPUs and TPUs, good at handling parallel tasks, but they tend to be more energy efficient. Hence the prime use case would be edge AI, such as mobile and smart devices, from a memory point of view, and memory consumption comes down to different, I guess, constraints there is. From a training point of view, the size of batch that you pick for the job drives your memory decision. Like I said, in the intrinsic scenario that I described, you got to load a language model to the memory. The larger the model, the more memory you need and the more you need to think about from a system design point of view how you optimize that as well Then. But from a hardware point of view, ddr are still the most common, affordable type that are great in working with CPUs, and then GPUs have their own DDR component, which is called GDDR, and there's been a relatively newer version, which has been around for 10 years, which is a high bandwidth memory, hpm. Again, that wasn't built specifically to support AI, but that's one of the options that provides ultra high bandwidth from a memory.

Speaker 2: 24:50

I guess a speed point of view, there is networking, and you guys know the drill here, so maybe we should talk roles. But well, there is an scale up and a scale out scenario there. I need some like a premium hardware which supports my model training, and that applies very well to small to medium size model training. But I want to distribute multiple jobs across different GPUs but within one single physical. No, I don't want to go over network, so in that case I would rely on some special hardware that provides inner connectivities between GPUs and their memory components.

Speaker 2: 25:34

There are two proprietary options that I know of. There is NVIDIA, nvlink and there is AMD something, I think, something along the lines of Affinity. Realistically, for most jobs you need to go over network, in other words, you need to distribute workload across different physical nodes. So the answer is ideally you need a combination of both An awesome single node plus I need multiple of those, and to be able to go over network. Again, there is a proprietary option that provides that direct connections between GPUs across different nodes. There is InfiBand and there is an Ethernet version of that which is called RoQui. Most of these techniques rely on a method called RDMA, or remote direct memory access, to offload some of the network management tasks from CPUs. So that's that. Of course, infiband is a primary choice and I think that's what Microsoft is using in their supercomputer that is used by OpenAI, but that Rocky tends to be more popular among network folks because of the familiarity and the skill set that they can retain from an Ethernet point of view. Yeah, that's right, awesome.

Speaker 2: 26:55

And the storage tends to be the most agnostic of this. From a workload perspective we still need an object storage for flexibility. If we deal with lots of transactional data, then we're going to sort of fall back to hard disks and stds, so storage is less of a concern. But the question becomes how you communicate, how fast the storage can communicate with memory. And typically for hard, for hard disk and ssd, you would use sedar or s as the protocol. But there is a flavor of STDs that use PCI Express, which tends to be heaps, tends to be faster exponentially over SATA, but that's only available to STDs and that's called NVM.

Speaker 1: 27:45

Express.

Speaker 2: 27:47

Which is just basically STD over PCI Express, pci express right yeah, it's really fast.

Speaker 3: 27:52

Fast, though it feels like flash memory almost, like it's really. It's really, it's pretty fast, yeah, yeah well, yeah, that's, that's great.

Speaker 1: 27:58

Thanks for the thanks for the breakdown there. I think it's. I think it's relatively obvious that, as you've gone gone through all of that, that it's not trivial to kind of put these things together right. You need to actually be very focused and make sure that you're picking every right tool for the job here, especially when it comes to AI. I will say you have a note here that written down about confidential computing and I will say I don't necessarily know if I know that term or what that really implies. So can we kind of break that down? What does confidential computing mean to you?

Speaker 2: 28:32

Sure, if you have heard of enclaves or CPU enclaves, that's basically what confidential computing is it's about? A piece of hardware which lives inside the CPU and can create workload isolation. Every time the data moves into enclave it will be decrypted, and every time that it gets out of Enclave it will be encrypted. So that will provide that encryption or data protection in move to complete the lifecycle of data at rest and data in transit.

Speaker 3: 29:07

Yeah, that makes sense.

Speaker 2: 29:08

Yeah, how important, that makes sense.

Speaker 1: 29:09

Yeah, how important is that process like when it's all within, I guess, within an application or within a network that's under your jurisdiction? Is that as important? Or you know, we talk about going, you know, over third-party mediums. That's obviously I can see a strong use case for requiring encryption there, but are we seeing a lot of that within fully contained environments?

Speaker 2: 29:34

Yeah, one of the benefits obviously is that except for the application code, which again will be authorized to access that piece of sensitive data, no one else and that's the advertised benefit no one else can access data, even the cloud service provider.

Speaker 2: 29:52

So that would help organizations to sort of, you know, go through their own hurdles and establish controls when it comes to cloud. Because, again, this is less about how secure cloud is but it's more about how much control you've got from a customer perspective over cloud and that would help ticking that box or facilitate some of those activities from the customer side. And there are some strictly regulated industries, such as military, which aren't comfortable as much to be on the cloud and they are reluctant to go through that because there is a lot of work to sort of go through those hoops and tick those boxes. They may not have resources or a skill set or cost to finish that process, so they decide to stay on-prem. So things such as confidential computing, which is not again unique to AI, but it's more around data protection and compliance, would help those organizations sway towards the cloud option as opposed to the on-prem. So it's more about a control team than security team, at least from my perspective.

Speaker 1: 31:01

All right, yeah, thanks for breaking that down. For us that's really good. So obviously, sam, you work for a research group, right? So probably a lot of analysis goes into this and we will look at where things are trending. What's going to happen in the future, right? So I guess, at least from what you've seen in the market today, what do you feel are going to be probably the most prominent challenges people are going to have and what's in store for the future?

Speaker 2: 31:30

Sure, and again, this is a fluid topic, so I can only speak to the decisions that people have made in the past. But the first thing is I think I touched on it before from the cloud versus on-prem, most people still choosing cloud because it's convenient. They need a whole bunch of tools and cloud is fairly attractive from that perspective. So that's one. From a hardware point of view, one of the challenges that I've observed is supply bottleneck or supply challenges, and it may vary from market to market. Obviously, some markets, such as North America or Europe, might be more aggressive when it comes to AI, so therefore the type of challenges around that would be different to other markets. But also some availability issues might exist in a market like Australia or APAC compared to the US. So there might be some lagging. But that still remains to be the challenge in the next year or so.

Speaker 2: 32:42

And, as you might know, nvidia has partnership with all four to five major CSPs that we know of. But new players are coming in. Last year during the insights, even Microsoft announced their partnership with AMD, which is good news. Other players are coming into that GPU manufacturing, so it's no longer just Nvidia that might alleviate some of those pain points, but something that comes off the back of supply issues cost. Obviously the typical supply demand. The lower the supply, the higher the cost. But what I think will add to that is the fact that we need muscle resources for HPC. They are not usual, so these will add extra price tags to what we're dealing with. But I think that should be remediated in the future and I wouldn't be surprised that some of the cloud costs or operational costs to come down as well. The more abundance we get, the less hostile it's going to get from a point of procurement and provisioning perspective.

Speaker 3: 33:51

And the last.

Speaker 2: 33:52

Thing is optimization. Optimization is still a challenge and with DeepSeek that was a very good lesson learned. Like I myself always thought that OpenAI is going to make the best choices when it comes to that, it took about and this is from Forbes but also anecdotes between 40 mil to 80 mil just to train GPT-4 model. Just to train that, and the Gemini. Again, the range is wide. They mentioned somewhere between 40 mil to 190 mil. Again, that's a wide range, but this is the ballpark figures and the deep seek has changed a lot of that. It's gonna change a lot of that. But I think there's tons of opportunities to optimize. Yeah, if you take a top-down approach as well as a bottom-up approach, and by which I mean start from use cases and then break it up and take a divide and conquer approach to decouple different types of workloads and then move your way down from that.

Speaker 1: 34:58

Yeah, it makes sense. I mean, it's just a natural thing that we need to be able to do that right. And I'm just thinking back to the days when, initially, when we started storing data to you know, the days when you know, initially, when we started storing data, you know you needed like four life-size cabinets of storage to house like 500 kilobytes right now Like ENIAC, right yeah now it fits in, you know, kind of like the fraction of my pinky nail to do that right.

Speaker 1: 35:22

So it's like it's natural that optimization needs to occur. And yeah, I mean not to timestamp this episode too much, but this is, as Sam pointed out, this is only two days after the DeepSeek model has launched. So by the time we release this episode, I don't know if some significant amount of data would have come out about that.

Speaker 1: 35:41

But yeah, I agree with you there, Optimization needs to be in the forefront of this and I mean, I would think like obviously the vendors are going to, you know, potentially sell less the hardware components of the more things optimized. But I think that's kind of the natural way of the land. That's what we need in turn.

Speaker 3: 36:00

I actually don't know about that. Like think about it, if this stuff was easier to run sorry, I didn't mean to cut you off, though, but if this stuff was easier to run, like it was a lower cost of entry, I think we'd actually see more hardware because more people would need it. More people would be trying to build their own data centers to run it Like in theory.

Speaker 1: 36:18

I'm just thinking. I'm just thinking in terms of quantity, right, Obviously, if you, if you can run, if you can do the same job with you know a fraction of the gpus, then in turn I'm going to sell less gpus to that one specific customer oh, per customer sure yeah but yeah, to your point, adoption is key, right, it's, it's how many people are actually doing it.

Speaker 3: 36:38

So yeah, um yeah, so it's kind of a it's a balance, right I still don't think we need 100 or 500 billion dollars to to be invested in, invested in AI or whatever it is. I think that's a huge grift and I think I'm curious to see what things like DeepSeat keep showing, because, I mean, other countries are working on this too. It's not just China. There's other people out there working on this, and I'm curious to see the timing on. That was not a surprise either. Right after Project Stargate, we're gonna 500 billion dollars the next day. The china is just like oh look, I, I slipped and dropped my open source lom on the market yeah, absolutely.

Speaker 1: 37:18

Um well, yeah, thanks a lot, sam for uh coming on. I think we'll go ahead and start wrapping up here. So, um, last thing, uh, you know any any closing thoughts that you want to add about the episode today?

Speaker 2: 37:28

no, I think. Uh, I just formulated my final thoughts in that last question around these stats, but if you have any, you can throw it in uh, no, sounds good, like I said, I think.

Speaker 1: 37:38

I think we're um, we're all eager, we understand this is a very ephemeral uh environment right now, right um, whether it be yeah, whether it be the technology or it be the you know the roller coaster that is the NVIDIA stock price, et cetera Things are changing literally every single day. So, yeah, Well, Sam, thanks for reaching out and thanks for coming on the show. I think this has been informative. So if people want to reach out and talk about any of this stuff, how can they reach out to you?

Speaker 2: 38:10

You can hit me up on my linkedin. Sam's an onion and that's primary channel for a lot of people to reach out perfect sounds great.

Speaker 1: 38:19

Well, I will. Uh, I'll make sure we get that into the show notes. So if you want to reach out to sam, please check that and uh, yeah. So thank you so much for listening. This has been another episode of the Cables to Clouds podcast and we will see you next week. Goodbye.

Speaker 3: 38:35

Hi everyone. It's Tim and this has been the Cables to Clouds podcast. Thanks for tuning in today. If you enjoyed our show, please subscribe to us in your favorite podcast catcher, as well as subscribe and turn on notifications for our YouTube channel to be notified of all our new episodes. Follow us on socials at Cables2Clouds. You can also visit our website for all the show notes at Cables2Cloudscom. Thanks again for listening and see you next time.

Chris Miles

Co-host

Tim McConnaughy

Co-host