What Does an AI Data Center Look Like? - C2C046 Artwork

Cables2Clouds

Join Chris and Tim as they delve into the Cloud Networking world! The goal of this podcast is to help Network Engineers with their Cloud journey. Follow us on Twitter @Cables2Clouds | Co-Hosts Twitter Handles: Chris - @bgp_mane | Tim - @juangolbez

All Episodes

Cables2Clouds

What Does an AI Data Center Look Like? - C2C046

November 13, 2024 • Cables2Clouds • Episode 46

Send us a text

World-renowned network expert Peter Jones returns to the Cables to Clouds podcast, sharing his unparalleled insights into the electrifying evolution of AI networking. We promise listeners a captivating journey through the latest advancements and investments reshaping data centers to meet the high demands of AI. Explore with us the nuanced differences between AI for networking and networking for AI, as we shed light on how these developments are capturing Wall Street’s attention and web scalers’ ambitions.

Venture into the intricate world of networking technology as we tackle the tough challenges of packet reliability and error management, pondering whether the link layer should shoulder more of TCP's load. Discover the strategies of tech behemoths like Google, Meta, and Microsoft as they navigate the complex terrain of cost, power, and reliability. From innovative solutions like liquid cooling and disaggregated power to the futuristic promise of nuclear energy, we'll dive deep into the technological shifts required to power tomorrow's networks.

Finally, Peter lends his expertise to unravel the scaling challenges of GPU networks, especially poignant in the realms of Bitcoin mining and cloud computing. Listen as we draw parallels between overclocking and the immense strain on GPUs in mining farms, all while questioning the sustainability of these massive endeavors. As we explore Microsoft's strategic energy maneuvers and the potential of disaggregated systems, this episode promises a comprehensive look at the future of data centers and the efficient management of AI workloads. Join us for an enlightening conversation that paves the way for understanding the dynamic future of networking.

Links:
https://www.semianalysis.com/

Google
https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network

Meta
https://engineering.fb.com/2024/08/05/data-center-engineering/roce-network-distributed-ai-training-at-scale/
https://engineering.fb.com/2023/11/15/networking-traffic/watch-metas-engineers-on-building-network-infrastructure-for-ai/

Microsoft

Purchase Chris and Tim's new book on AWS Cloud Networking: https://www.amazon.com/Certified-Advanced-Networking-Certification-certification/dp/1835080839/

Check out the Fortnightly Cloud Networking News
https://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/

Visit our website and subscribe: https://www.cables2clouds.com/
Follow us on BlueSky: https://bsky.app/profile/cables2clouds.com
Follow us on YouTube: https://www.youtube.com/@cables2clouds/
Follow us on TikTok: https://www.tiktok.com/@cables2clouds
Merch Store: https://store.cables2clouds.com/
Join the Discord Study group: https://artofneteng.com/iaatj

Peter: 0:00

All right. So there's the scale-up network, which is the really closely coupled, and the scale-out, and so I think the scale-out is where Ethernet fits in, makes sense. So then the question becomes like which of the semantics that InfiniBand provides do we actually need? I mean, as you guys well know, right, we've seen lots of technology. You know technology A replaces technology B.

Tim: 0:17

Oh yeah.

Peter: 0:19

A wants to do everything that B used to do, plus data.

Tim: 0:25

Then, after a while, you figure out what it doesn't need to do anymore. Welcome to the Cables to Clouds podcast, your one-stop shop for all things hybrid and multi-cloud networking. Now here are your hosts, tim, chris and Alex. Hello and welcome back to the Cables to Clouds podcast. I'm your host this week, tim McConaughey at JuanGolbez on Twitter, and with me, as always, is my lovely co-host, chris Miles at BGP Main on Twitter, and with us, as a returning guest, peter Jones Is it at?

Peter: 0:55

PeterGJones on Twitter. I don't remember it is. I have PeterGJones in a whole bunch of places.

Tim: 0:59

That's right, I thought so, but I didn't want to speak, uh, and be wrong. But uh, yeah, so peter's a returning guest, um, I'll let him introduce himself. But uh, actually, let's do that. First go ahead and introduce yourself, peter, and then we'll get into it, okay so I'll try and be brief um when I do this at cisco live.

Peter: 1:15

I get my co-speaker to introduce me. It's always more fun oh so so I'm peter jones. Uh, I work for cisco as a distinguished engineer in the hardware networking business. I also do a bunch of work in IEEE 80.3 standards the most big ones and I chair the Ethernet Alliance. Now, strictly, be very clear in this meeting I'm speaking for myself, not for anyone else. And, as I said to these guys earlier, I have a lot of opinions and some of them are good.

Tim: 1:40

That's what we like to hear. But yes, so Peter actually is a returning guest. This was actually, you know. I went back and looked, chris, this is actually one of our I won't say first shows, but it's pretty early in our run. We had Peter on a year ago, back in August, and at that time, we wanted to know Peter's opinion on because AI was really starting to take off at that point. We wanted to know Peter's opinion about, like, okay, well, now that everybody's building these AI-powered workloads and everything, what is building networking for them going to look like and at the time, of course, it was still way too early to make any kind of predictions. Exactly, I think you did something very similar at the time, or maybe you asked something like you know, would you like me to just pull something out of my ass, or something?

Peter: 2:26

I would never have said anything like that, maybe.

Tim: 2:29

But yeah, right, but now it's been a year and I mean, anybody who's been paying any amount of attention knows that there is a lot and it is moving insanely fast. And so I thought we thought it would be a great idea to bring Peter back and do a little bit of a retrospective, just a little bit of like hey, how has it been this last year? Did it meet any expectations that anyone had? And then, looking forward you know like what's on the horizon, especially in terms of what AI networking is going to look like, moving forward, we think so yeah, let's start right there. So it's been a year. Again, we had no idea what it was going to look like a year ago. I feel like it's been, it's changed and and been so different for this past year. Uh, peter, so what? What's your been? Your like, your observation for?

Peter: 3:12

how this is gone if you think, for the time being, that right now individual is being the best supplier of picks, shovels and wheelbarrows in the world. Yeah, and we, and we know that our web scaler colleagues are buying everything I can sell as fast they can buy, so we know that if you want to get interest in Wall Street, it's like you must mention AI in everything, so it's clearly a flavor of the year. Now I've been looking at some numbers. There's an enormous amount of capital going into building data centers and doing a little bit of research. Recently I've got myself a lot smarter and these guys are a lot further advanced than I imagined, because I just wasn't looking. What's interesting is, if you look, there's actually a whole bunch of stuff out there about how this stuff's being built, and maybe there'll be some links we can put in the show notes to help people go take a look themselves for sure for sure yeah, I think it probably an important thing to level set here is you know, we've it's.

Chris: 3:59

It's been a year since we've had this conversation. I think even back when we had the original convo we probably didn't even have this Hang on a second. There's a plane going over, yeah. But I think when we had this conversation a year ago, this was before we even had kind of the differentiation in our minds about AI for networking versus networking for AI. So, like today, the premise of the show is about networking for AI, right, and how to build networks for the purpose of you know, training models and doing RAG and things like that, even some more very complex stuff that I'm probably not smart enough to understand.

Peter: 4:32

All that stuff that the people with billions of dollars in their budgets are doing.

Chris: 4:35

Exactly.

Peter: 4:37

So this is actually the don't try this at home, because you can't.

Tim: 4:41

Yeah, good luck getting your hands on any of this gear.

Chris: 4:43

I'll put the little jackass disclaimer on the screen. Don't try this at home.

Peter: 4:47

So one of the things that really got to me was when I read that Microsoft was paying to reopen Three Mile Island and agreed to pay for all of its power for 20 years. You go. Okay, that's crazy, this is, yeah, all right. So if we go, back.

Peter: 5:04

So this has been rolling around the networking business I live in for a while. Everyone's been going well, we should do AI for networking, networking for AI. It should be this, it should be that, it should be all the other things. And, as you know, a while ago I put a post up about the three, you know, the nine blind men and the elephant. So no one exactly knew what it was. So there's a whole lot of stuff came out of OCP, which was a week and a bit ago. All the slides and videos are online.

Peter: 5:29

I'd highly recommend people go look at them, because I'm going to only scratch the surface if that. So I think the big thing that people have figured out is, if you look at the structure of a network, they tend to speak about backend and frontend. The frontend is our normal network and the backend is a network specifically for AI. Then they talk about scale-out versus scale-up. So scale-up is basically the set, the communications between a very closely coupled set of GPUs and CPUs, and that set could be anywhere from tens to hundreds to thousands, depending on who you are. Then there's a communication between those clusters.

Peter: 5:59

Which today is mostly part of a data center can become a full data center, multiple buildings, and then you start going into multiple data centers in a region. So there's a whole bunch of stuff about Google going off to try and train stuff across multiple data centers and, as you all guys know right, when you go from place X to place Y, things get really interesting because all your assumptions tend to break. And so the Ether Alliance event we had last week, which is called Ethernet in the Age of AI, unfortunately wasn't recorded because we were trying to preserve the ability to have a whole lot of conversations, which would be interesting. It was triggered by a friend of mine being, you know, talked to by someone else from Meta, and the argument was well, this Ethernet with 200 gigabits per lane is almost done. What about 400 gigabits per lane?

Tim: 6:42

We're just starting to write the baseline for 200.

Peter: 6:45

What do you mean? 400? It's almost done, and I think the thing here is that the expectation is just so high. If you look at it as much as we think networks are important, they're just a means to the end.

Tim: 6:56

In a campus. The network exists, so a lot of people do their job, absolutely.

Peter: 7:00

In these places, the network exists, so GPUs do their job, and so in an ideal world, you don't notice the network when it's not there, that's right.

Tim: 7:07

Infrastructure right. Yes, good old water power.

Peter: 7:13

And so the thing that's a little different about some of these workloads is that they do, and there's a whole lot of complicated stuff I have to read about all to all and all to this and transformers and stuff. But in general there's this massively parallel workloads that every now and again have to stop and talk to each other, and the problem they tend to have is, until they've talked to each other and finished, they can't go forward. So they have a barrier to going forward. So what they really care about is in that network, while you're busy, you've got to be 100% busy. So this is where we're seeing a whole lot of stuff coming out low balancing more effectively, and there's a bunch of ideas out there, but it seems to be the conclusion seems to be on the Ethernet side of the house. What you have to do is you have to spray across all the links, you have to reorder them on the way out of the network. So the sort of general assumption is so for that period you're busy and I don't really know whether it's 10% of the time or 5% of the time you want to be 100% full. You want no congestion and you want no packets dropped. So the run-of-the-mill story is, if you build it as non-blocking and if you low-balance evenly across all your links and you reorder on the far end, you're okay.

Peter: 8:11

There's a bit of a disagreement about whether you reorder in the exit switch or in the NIC. I tend to think it's the NIC, because the NIC can be connected to multiple switches. That makes sense. There's also a whole bunch of questions about what does latency matter? And my judgment of what I heard last week is latency matters a lot in the scale-up network, which is within the relatively small set of GPUs, and there they're running something like NVLink.

Peter: 8:34

There's another group that started recently called UALink, so this is the universal adapter, something or other link. So this is basically everyone but NVIDIA coming up with an alternative to get NVLink Got it. So these is basically everyone but NVIDIA coming up with an alternative to get NVIDIA back Got it. So these extremely high bandwidth, extremely low latency, extremely reliable this is basically giving you semantics, like you are inside a system, but those links only go so far. The exception to that is Google, who has a whole bunch of interesting stuff on their network called Jupyter, where they use a whole bunch of programmable optical switches, and so their structure is a little bit different, I think in sort of the measure of Microsoft. Their idea of a scale-up is like maybe 10 to 100 nodes. Google is more like some number of thousands, okay, okay. And they're also using the fact that they have this optical cross-connect to simplify adding or deleting from the network, because they don't recable anything.

Tim: 9:19

Yeah, I'm having trouble visualizing it. I'm actually just thinking through it. I'm having trouble visualizing it. I'm actually just thinking through it. I'm having trouble visualizing what that looks like ultimately.

Peter: 9:27

So there's the scale-up network, which is really closely coupled in the scale-out, and so I think the scale-out is where Ethernet fits in Makes sense. So then the question becomes like which of the semantics that InfiniBand provides do we actually need? I mean, as you guys well know, right, we've seen lots of technology A replaces technology B.

Tim: 9:44

Oh, yeah, a.

Peter: 9:45

wants to do everything that B used to do, plus data. Then after a while you figure out what it doesn't need to do anymore.

Peter: 9:50

Right you pair it back right. I've been through, you know, Summit and X25 and ISTN, et cetera, et cetera, et cetera. So now if you look at some of the things that UEC, the Ultra Ethernet Consortium, sorting is trying to do, so they include link level reliability. So this is guaranteeing a packet gets from one end of the link to the other and retransmitting if need be. So because they're trying to provide a reliable network semantic, like InfiniiVan does.

Tim: 10:11

Do you need this Just real quick before I forget, peter. So link level reliability is not. I mean, we're not talking about like. So normally we do this with like reliable protocols like, for example, tcp or whatever.

Peter: 10:24

What is link level reliability? What is the mechanism? That means you basically knowledge individual packets across the link and retransmit them if I don't get there.

Tim: 10:32

Like at the switch or whatever the device level is.

Peter: 10:36

If you go back in time, way back in time at a 2.2, which is LLC, define type 1 and type 2. Type 2 was reliable, yeah.

Tim: 10:43

Okay, I'm following.

Peter: 10:44

And the sort of the interesting assumptions in here is that the places you know they're trying to avoid any queuing delay. So InfiniBand offers you lossless semantics and so they're trying to match that. Okay, I'm not convinced whether you have to do that or not. I think that's one thing we tend to forget about. I mentioned in a warm-up right we were talking about the 12 networking truths.

Chris: 11:03

RSC 1925.

Peter: 11:05

Yep, and rule six, which is it's easy to move a problem around the network than it is to solve it, and sort of the corollary of that is it's easier to assume you know what the other guy wants and work inside it. That's right, and I think in reality there's much more conversation to be had about how to write software that interacts with what network hardware can actually do, because we can make network hardware do some really strange things, but that may not be the best idea.

Peter: 11:27

Part of the problem we tend to have is the communication between you know, the web sellers at the front and some of the networking people is a little bit sparse than it should be.

Chris: 11:35

Yeah, this may be a bit of a silly question here, but I'm just kind of thinking in context about, you know, networking for the longest time has been solving applications problems. Right, we've been solving the problem of you know bad code writing for years and years. And you know, we've seen yeah, we've seen things.

Tim: 11:54

Out of order packets? Yeah, exactly Out of order packets, lost packets.

Chris: 11:58

We've seen things moving down the stack, like, if you think about, like the QUIC protocol that was kind of solving some of the stuff with some stateful nature of tcp and then tcp's you know be supposed to be able to handle reordering packets and things like that. So it seems like in my mind we're moving that even further down the stack and we want that reliability built into the link layer. And and I guess my my bare bones question is is why do we need to do that at the link layer just so TCP doesn't have to do that up the stack?

Peter: 12:28

Is that the whole point there? I'd say this stuff is not running TCP, but I'd also say that the competitive technology does this.

Tim: 12:35

Okay, so therefore you just assume that you need to do it.

Peter: 12:39

The easiest way to come in is to say yes, I'll do that.

Tim: 12:41

Yep, you're absolutely right, and so we've been through this with you know, data center networking um, have building networks for rocky etc.

Peter: 12:48

The easiest thing to say is yes, I do everything that did and one question and cheaper.

Peter: 12:54

So the question of what I mean. Last week, the conversation I was trying to get to was what we should do, not what we could do, because we can do lots of stuff, and so some of the things that all came out of it was like everyone has a different opinion of this thing. Right, this is the man and the elephant. You know, I can always find someone who wants what I want, right, but it's very different to figure out what's the right thing across the industry, and that's really hard. I mentioned earlier that you know, scale up and scale out are really really different. So we started talking. If you're talking copper versus optics, it all comes down to cost power and reliability, in which optics is great for reach but cost power and reliability sort of stick.

Peter: 13:28

Absolutely, absolutely. And in these networks, right, there's so many devices that you have, you know, if your error rates end up, that you're getting errors in the numbers of minutes, right, because there are so many links in these networks. So in some points of view, the networks are having to be designed to account for errors that we don't normally see.

Tim: 13:44

Yeah, this is a little bit in the reverse order of. You know, when you migrate something, an application, some homegrown application, you take it out of the data center, put it in the cloud and add 30, you know 50 milliseconds of latency. All of a sudden the app doesn't work anymore. This is like the opposite of that, basically.

Peter: 14:02

So I mean people are looking at, they'll get. You can get really frequent adapter failures or copper failures and you go. Yeah, that's not such a great thing. So again there's discussion about how big can you make a Radix of a switch. So my friend from Google said he actually likes lots of small links and he can make the radix bigger and have less levels in his network. All right, that's pretty cool. So he tends to want every service he has to be a separate port. So we prefer, uh, one 400 gig port for 100 gig ports to one 400 gig yeah, I guess you can split that differently assuming you have proper, good load balancing.

Peter: 14:38

I mean on my notes here, which was sort of scattered around on the power side, they're looking at 120 kilowatts plus per rack. That's per rack that's.

Tim: 14:48

I know that's insane. I'm sorry I I failed to.

Peter: 14:52

That's just yeah so it's a lot. There's actually there's a project I think it's met on microsoft called matt diablo. So it's a um, disaggregated power, which is running on 400 volt DC and it's hundreds of kilowatts, up to a megawatt. Yeah, I mean so. And then how do you cool these things right? The answer, initially, is it's really complicated liquid cooling.

Tim: 15:12

Yeah. So I mean, the Three Mile Island thing is a perfect example of somebody made a funny joke or a meme or something.

Peter: 15:20

That's really big.

Tim: 15:21

I know I don't think people understand the scale of what we're talking about here. Somebody made a great meme about like we're burning down the Amazon rainforest so that somebody could get a picture of like a woman with three breasts or something from Downey.

Peter: 15:37

Sorry, a hedgehog playing chess.

Tim: 15:39

Yeah, exactly Whatever right. That's the PG one, something from a hedgehog playing chess.

Peter: 15:41

Yeah, exactly, whatever right. And in this, in this particular case, we actually were reopening the harrisburg three mile land reactor for this yeah, I mean, it's nuts man.

Tim: 15:49

This is going to be what makes nuclear power like actually accepted in america.

Peter: 15:53

This is going to be probably but it's also going to be nuclear power next to the dc because the agreement is not going to cover it. Yep, you got it. So where was I? So another one of the networking truths is one size and it's all, and so it was interesting listening to our three guest speakers, so three keynote speakers, which are Google and Meta and Microsoft, and they had some things in common. They also had a lot of things differently. I mentioned earlier the way Juniper, the way Google builds a network which is called the Juba architecture is really quite different, and part of it is because that's where they started. Also, I think they were building their TPU chips a long time before anyone else. So you know, if they look at scale out, they don't like latency jitter.

Tim: 16:40

They don't really care about latency so much, but they don't like jitter oh, yeah, okay, they don't like uh variable.

Peter: 16:42

They essentially variable latency. Yeah, I'm while I'm falling and if you got, if you got, uh, full load balancing lots of skinny links, then they reduce levels. Um, now the scale up network. They have their own thing called ici, which I don't know that much about. They say you know it's generally, it's 10 times faster, it's lossless, very low latency, uh, and they're actually still bigger than other networks because they changed them with this OCS thing. If we look at meta, the scale-up networks are tens to hundreds of nodes and the scale-out is 10K to 100K nodes. That's a hell of a scale-out, but I mean, it's just the thing, right? One thing is are you inside one of those bigger VRX or next to it, yep? The other is are you the full data set?

Tim: 17:16

Yeah, yeah, yeah. Like the swarm, the swarm way of doing things right, Like lots of little compute.

Peter: 17:22

And then you get. So you know scale out is where UltraEthnet Consortia fits and where the Ethernet natively fits. You know scale up is likely to be NB-Link or this new thing called UAD-Link which got announced like today.

Tim: 17:41

So this, which is every, which is everyone but nvidia, because you know what? No, but yeah, I mean, it's very clear why they want to build their own shovels. They don't have to buy the shovel, right?

Peter: 17:46

well, actually they just want to be able to buy shovels from one person. There you go. So if it's talking about what was most important to these guys, the first thing I got to was sort of resilience and reliability. Um, I have some links about this, but basically clusters are so big that your failures are happening frequently in order of minutes.

Tim: 18:03

Oh, because you have hundreds or thousands of nodes, that it's guaranteed that something's going to break somewhere, right?

Peter: 18:10

So there's a link we can put in from some guys at Semi-Analysis and they basically have a note about if you're building a cluster of 100,000 H100 systems and you go yeah, that's a lot. Now the network isn't the biggest problem here. The biggest problem is somewhere else, but we're part of it, so the packet errors will only happen once in my lifetime. Yeah, there's a lot of lifetimes there.

Tim: 18:29

Oh yeah, yeah, you were dealing with the economy of scale in all cases there. Packets number of packets. Number of links. Number of nodes.

Peter: 18:40

Yeah, just multiply it all out. And so part of what these guys are looking for is as much information as we can give them about when things are going bad. So traditionally we probably would have just given you packet losses. Now we're going to want to be going and doing it per lane, per symbol error, everything else they can do to figure out whether things are going to be okay, because unplanned outages are very bad. Oh, yeah, yeah, yeah, absolutely. And how bad it is sort of depends on structure, because some people checkpoint really rapidly and some people don't, and so the real question becomes is when you have a link failure or you have a node failure, right, how much do you lose?

Tim: 19:10

Yeah, especially if it hadn't. You know, we were just talking about how they have to do a certain amount of computations, then check in sync and do all of that, like. So you're constantly essentially losing data to some degree.

Peter: 19:19

Right, it's like an acceptable loss to be losing nodes the other one that I read recently and posted out was that they're they're blowing through the life of the gpus really I saw that. Yeah, it's like okay so if your gpus are gonna fail after three years.

Tim: 19:33

I mean that yeah, I read that article too. It's crazy that.

Chris: 19:36

So just to call back to, uh, the point you made earlier about this kind of focus on the I'm gonna call it, uh absolute optimal productivity, like no idling at all.

Peter: 19:47

Right, anything that's available should be consumed, um so so so yes, but what you also see is people end up having they'll have some number of clusters sitting there idle so I can swap them in when no cluster fails.

Chris: 19:59

Well, that's that's, that's where I was going. So like, if I think about you know, I mean this is a new thing, right, people you know, if we think about you know weigh-in circuits and things like that, the scalability.

Chris: 20:10

But people feel like oh, I have two circuits at my site. You know I want to use both of them. Why is one sitting idle the whole time? Right, but idle the whole time right. But I'm wondering, how much does this increase the amount of failures in a shorter amount of time, because the resources are facing a higher level of constraint.

Peter: 20:29

So the article we looked at basically suggested that, since they're running full bore all the time, as fast as possible and as worthily as possible, the lifetime sort of stinks. And someone else made the point that, yes, we've seen this in Bitcoin mining.

Tim: 20:42

Yeah, I mean. So the actual computation on a GPU bit. That's not new, right? We've done it with Bitcoin for years and those have been well known to be. They'll build Bitcoin farms. What's that guy on Twitter, Adrian Kentrell, that does the training for AWS? He was posting all the time about his Bitcoin farm stuff always having node failures. It's like every single day, man, so that part's not new.

Peter: 21:06

So this is like if you get your home computer and you overclock it, right, it's probably going to be less like you.

Tim: 21:11

And overclocking is known to lower the lifespan of your processor as well. I mean, you're cooking the hardware, that's just straight up. You're just you're pumping electricity.

Chris: 21:22

I think at a GPU level it makes perfect sense. I'm more speaking to how does it affect the network, If you want the network also running in that same capacity.

Peter: 21:30

So this is. I think this is where it gets complicated. So one way you say is these things cost so much they've got to be running 100% of the time. But then you go okay, so when I have a failure, what happens next? There's a whole bunch of stuff that seems to you about. Certain numbers of things in a pod work better than certain others, depending on how your connectivity is. So you end up having to have groups of spares, so when there is a failure and they'll be running warm spares, so when there is a failure, you can swap over for the least interaction with everyone else. But of course this is a fun problem, because if a portal is expensive stuff, so we're going to get back to the. It is the N plus something, failures, and how how many do you have and how many do you have standing by and how long is the failure. So we've seen all this stuff, but again, the scale is just enormous.

Tim: 22:14

You know it's crazy. I just thought about this and I'm sure I wonder if anybody I'm sure somebody smarter than me has already thought of it and they're trying to figure it out. But think about the supply chain of this right. Like if you're buying 100,000 GPUs, you're probably like buying an entire run like of GPUs from Taiwan or something or wherever the hell they're coming from, and they're all probably going to fail at about the same time, Like their lifespan is about the same.

Peter: 22:39

I'm not sure how that works out. It does seem that, again, what I'm reading is that getting hold of these things is hard. Some of it depends on who you are, what your connection is. I read something the other day which is Dell is becoming more loved by NVIDIA because I think they'll be able to deliver at scale Okay, but there's probably enough mix-up between where they're going. I think they'll be able to deliver at scale Okay, but there's probably enough mix-up between where they're going.

Tim: 23:00

I think when you buy 100,000, they don't all show up as one big oh yeah, I couldn't even imagine the truck that would show up with that right.

Peter: 23:07

So if you're, actually getting them as part of the NVIDIA big racks or something else. I'm sure there's a bunch of mixing in there.

Tim: 23:24

Yeah, but we're not some point you're going to get into some kind of sick cycle of supply where you're probably burning through a decent amount of nodes at the same time.

Peter: 23:33

Sure, and I think that's probably true, but if the reality is that they have a three-year lifetime anyway, that's not that long, it's just constant swapping all the time at that scale right.

Peter: 23:44

So this is like the Sydney Harbour Bridge, right. You start paying at one end. By the time you get to the other end, you start again. So everyone cares about power and they really care about increasing work per watt. They're going to deliver as much power and cooling into a rack as they possibly can. The question is is what can you get out of that? Again, the networking part is close to the loss of noise here. But if it's a watt they spend on networking, they can't spend it on the other stuff.

Tim: 24:06

Yep, but they can't not spend the money right, you've got to have the network. It's not like they can't spend it.

Peter: 24:11

You've got to have the network right. But your goal is to, you know, this gets really interesting because if they want to run any reasonable distances for some of these links that they're on optics, optics is really expensive, oh my God, yes. And then you go. Well, I want to run copy over 20 meters, it's like, but I don't want to pay for FEC, it's like. Yeah, we can't do that. And so you know a whole lot of places here. Physics is hard, so it's. You know it's changing to stop anytime soon.

Tim: 24:37

You know, and I didn't necessarily want to get there yet, but it begs the question really. Just the whole thing we just talked about just truly begs the question, the unasked question which is, if we spent this much money for this much workload, like, where is the ROI? Like, what are we getting out of it?

Peter: 24:53

I have been wondering that myself. Okay, so this is a question that should not be asked, of course, because the answer is it's just wonderful and we need it.

Peter: 25:03

I sort of wonder about Level 3, when they put all the fibre in the ground and went bankrupt. All that fibre is being very useful at the moment to these people who want fibres. I'm not sure that I see the ROI here. I understand we must do it because we have to. So for the time being let's put that aside. Yeah, because that's that's the problem. We can't solve here. That's a problem for the business people. But what we have at the minute is we have this problem our customers wants to solve and they're really hard right, the engineers. So one of the uh presentations I was listening to the engineer presenting. It was talking about sitting in connectors and just where the metal wipes and doesn't quite finish.

Tim: 25:39

Okay, it's a tiny piece there right which which becomes a stub in the transmission uh from the transmission light, and so you can have stubs where there's, you know, like a millimeter of metal yeah, this is getting this is getting pretty interesting, so the physics is hard I mean, yeah, I guess it makes sense like imagine having to have essentially perfection to get like the, that, this level of low latency without any problem, like from a machining perspective, from a, you know, a fabrication perspective and so then you go down to the okay, so we want to, we want to reduce load, and you see a whole lot of stuff with people building flyover cables right, because flyover cables are better than pcbs, but they're also sort of complicated.

Peter: 26:16

Then it's like how close can the flyover cable drop onto the chip? We've seen a whole lot of conversations about co-packaged optics, but that doesn't seem to be taking off, so you're left with surdees coming out of chips, and getting it to the other end is hard. If you go way at the top end of systems, you see people building cable backplanes. Right, because the results are just there, running on a PCB but they're mechanically really complex?

Peter: 26:45

I don't, yeah, to be fair, I haven't. I haven't seen this, so I'm having trouble, okay, so so go back to the pictures. Remember when the telephone exchanges, all these people plugging oh, yeah, yeah, yeah, imagine that. It's that basically, okay okay, excellent, it's lots of cables running everywhere, um, which is a technical solution, but I'm not sure it's a solution you can build by the tens of thousands, because not only is this hard, they want these things in hundreds of thousands to millions.

Chris: 27:08

So I remember a year ago, when we had this conversation, we did have your memory is much better than mine, Chris. You've got to go back and watch your episode after this. Yeah, we did have this conversation.

Peter: 27:20

I'm trying to avoid watching myself. I can never handle it.

Chris: 27:25

I recall that we had this conversation about you know these purpose-built networks right, building them for a certain length of time, for training, a model right, when you need the utmost performance in that scale. So I mean I get where the hyperscalers and you know Google, meta and Microsoft all need that capability. But are you having conversations about these purpose-built networks for the purposes of training at the enterprise level and how do you make that modular and consumable from that level?

Peter: 27:58

So I'm not on that side of Cisco house. In my head this looks like specialized networks, and I did read something earlier today about how much bigger Microsoft's thing was than what's in Los Alamos. So my running theory is the original model. Training is going to be after the web scalers.

Peter: 28:15

My guess is, adjusting a model is probably something you read from Amazon Like fine tuning, you mean yeah there's a whole lot of groups out there that are off building AI data centers with a plan to rent them to the web scalers. So my guess is it becomes a thing you can go rent. I think for most people that's the only way you could do it, because you couldn't get them busy.

Tim: 28:33

I think we thought the same thing. Honestly, this is triggering my memory now as well. I think we had the same idea a year ago and having now put a year between back then and now and seeing the way things are going, I think that's still pretty safe.

Peter: 28:47

I think it's still a pretty safe bet. It gets a little interesting for you guys know what FedRAMP is. Yeah, yeah, yeah. Okay, it gets a little interesting for the classified cloud, but I think the web scalers have already solved this. They cloud, but I think the web scalers have already solved this.

Tim: 29:00

They're already selling that. Actually, it just occurred to me. Our listeners may not be familiar with FedRAMP. It probably would be useful to spend the 10 seconds.

Peter: 29:07

I would say FedRAMP is the requirements you have to meet to sell into classified in federal. When the cloud stuff first came up, it's like cool, there's this cloud thing out here, we can't use it. But if I recall correctly, both AWS and Azure have now got a FedRAMP certified cloud service.

Tim: 29:26

That's correct. I mean, even where we work, we have to deal with the stuff as well.

Peter: 29:31

And if you get back to the conversation about Microsoft buying the power from Fremont for 20 years, I'm not seeing that in an enterprise DC.

Tim: 29:38

Oh, no way. Right Like at best I think it's closer to what you said right Like the enterprises aren't going to build like a true enterprise scalable data center that's going to do AI work, like they'll probably they could probably have something small for, like, fine tuning a model they already have access to, or something like that.

Chris: 29:56

We're already having conversations about whether or not it's even worth it to make your own model right. Where's the ROI on that? If we're talking about building the infrastructure, fuck no, there's no conversation.

Peter: 30:06

This goes back to okay. So you want to do generative AI and you say it's wonderful, and then you go and ask the CFO for the money. He looks at you and says really, so my guess is that we're always seeing companies will not run against the public model. They have their own copy locally because they don't want stuff to leak. I think Samsung was probably the worst one for that. Oh, yeah, yeah, for sure. I mean, I think the real question is training models doesn't make money.

Tim: 30:28

Influencing makes money Exactly.

Peter: 30:30

And then the question is is where is the inferencing going to run? I don't know anywhere near enough about this to be, but I sort of see it seems like inferencing is going to become part of a standard data center workload.

Tim: 30:39

Yeah, so I mean one thing that I'm doing. I'm doing a presentation at AWS reInvent talking about. One of the things I'm presenting about is like this idea of a disaggregated rag where you can keep your data source completely on on-prem and just send vector just send vector data into a database in the cloud, right for for for enrichment purposes, like for lots of for the same reason you just talked about, which is that you don't want to put your data. You know you don't want to give your data the to the cloud or put it.

Peter: 31:08

I mean I, I mean I think that becomes a little interesting because I mean so that's why you've seen people shuffle stuff around for data sovereignty stuff, yeah. But I think people would have to go and take a look and say what's the risk reward of putting it in both places? But I was more going okay. So you end up you have a model. You need Whoop-de-doo. Congratulations very well done. Calculations are very well done. Then you have to actually provide something useful out of it, which means you're inferencing.

Peter: 31:32

I don't see how you split that from the standard data center workload. It's going to be everywhere. Again, in my head, we used to have specialized networks for everything. There'd be a storage network or this network or that network. Now we've sort of gone hyper-converged. So if I look out a few years, what does the standard data center workload look like? Does it have a bunch of normal servers over here and an inferencing group over there? Is it scattered across the middle? I don't know the answer, but my guess is ultimately the web scalers are very economically rational. They're going to do whatever is the cheapest.

Chris: 31:59

I think that's definitely true, I would think they also build these things to be insanely modular. Right, you want to be able to just throw things in, throw things out, and I feel like the only way you're really going to have uniformity is if you have dedicated pods, essentially, that are doing the same task. So I would see all the inferencing falling in one spot and all the training falling in a different.

Peter: 32:18

So so the? The counter argument would be if so, training is somewhere over there. But when you're doing stuff and you're inferencing and doing other web workload and storage and everything else inside that part of the network, do I have like three separate components or do it's all one thing, with everyone being able to do some of it?

Tim: 32:34

do I have yeah?

Peter: 32:35

like 100 servers over there doing compute and 100 gpus over there doing inferencing and yeah, backwards and forwards, or do I put them all inside the same rack.

Tim: 32:42

Yeah, I that's. That's a really classic. It's funny how the more things change, the more. They say the same right, Because that's a very classic question. Right In terms of purpose building your workloads Rule 11. Rule 11.

Peter: 32:54

Exactly, I was just thinking of that one actually For those who aren't following along, rule 11 of the 12 networking truths is every author will be proposed again with a different name and different presentation, regardless of whether it works. So I think the answer is there is so much happening. I think the problem we actually have is there's so much information it's hard to consume. At risk of a plug for a friend, I've been reading some stuff from a group called Semi Analysis. They're an analyst firm, but a little different, so they have a feed you can subscribe to and sub stack. Now about I don't know, it's like half you can see and half is paid, but the half I could see was enough to persuade me to actually pay for it. And they've got a lot of cool stuff out there and so that's somewhere I'd seriously go because they've done a bunch of looking at it.

Peter: 33:36

Now, it's not a light read. It's hard to know what you do when you look at Google and Meta and Microsoft and all these other guys, because they're all different. They've all got lots of stuff. Yeah, that's a lot, as I said, the other thing is people should go take a look at all the stuff from ACP, because the slides and the video are there. For me. If just the slides isn't good enough, you really need to see the explanation on the way through. But it's really complicated, it's expensive and it's really hard, and the problem we're going to have is how do we avoid building everything?

Tim: 34:05

Yeah, yeah, no, that's, and this is honestly. I feel like this show is going to be a heavy lift for a lot of people, myself included, so I want to make sure we get all the. We're going to get all the links and, as Peter's warned, I'm sure a lot of this will not be a light read, but this is going to be. Whether we like it or not, this is probably going to be a big part of networking, both in the cloud and out of the cloud.

Peter: 34:29

It's going to be huge in some subset of the cloud, and right now that subset of the cloud is consuming everything else. But as far as I can figure out, it's also being built in a very scalable fashion, in which case they build it once, they build it a lot of times. So I don't know what percentage of people actually ever get to play with this stuff. Be you're going to have to be fairly high up in those organizations to get there, I think. For the rest of us it's like watching Formula One it's a very cool thing, I'm not sure I'll ever get to build it.

Peter: 34:58

Yeah, right, yeah.

Tim: 34:59

I might be able to consume it or pay somebody to give me a ride in it.

Peter: 35:03

Now I think the question you actually get to is what? What out of this comes back to a more broad market? Because this market at the minute is very, is very, is quite narrow but extremely deep. Now, what out of this comes back to a more broad market, I don't think we know yet.

Tim: 35:16

Yeah, I think that's fair. Oh man, we could just keep going on this.

Peter: 35:20

And I do have to warn you if you stay here, I will just keep talking All right.

Tim: 35:25

Well, that's probably good as reason as any to go ahead and wrap it up. This has been such a good episode. I'm actually going to be chewing on these links for a while. I'll make sure we get them on the show notes.

Peter: 35:37

So, tim, maybe what we do is we let it simmer for a while and maybe we come back in a few months and maybe do a. Here's what we heard from people. Can we try and figure it out together?

Chris: 35:47

I love this. We'll just do a quarterly update with.

Tim: 35:52

Peter. Yeah, we're just going to have to have Peter on it for a quarterly update. We can't wait another year. This shit's moving too fast right so we would definitely agree. This has been awesome. Go ahead. No, it's okay.

Peter: 36:02

I was just going to keep talking.

Tim: 36:05

This has been awesome. I'm really glad we were able to get you back on. We will definitely have you back on earlier than a year from now. There's so much more to talk about. We'll get all the stuff in the show notes, and thanks for joining us tonight. Any closing thoughts, chris?

Chris: 36:17

No, I think it's been good. We wanted to keep it as just kind of a conversation rather than I don't do formal stuff, right?

Peter: 36:24

I really don't.

Chris: 36:26

So I think we did good on that front, and it's interesting to hear how things changed. Obviously, we need to do this in a few months rather than 12 months, so we'll make sure to do that, but yeah, I think it's good.

Peter: 36:38

I mean, I think the other thing is to try and figure out what sorts of people you want on, Because I mean, I like being on, but I know you much.

Chris: 36:45

We've got some ideas we were actually thinking about this before, so don't worry, we We've got some ideas. We were actually thinking about this before, so don't worry, we've got a follow-up.

Tim: 36:52

We'll reach out, because I'm sure you've got connections for people that could come on here and just blow everybody's mind even more. So yeah, definitely, okay, so yeah. So this is Cables to Clouds podcast. Thanks for listening, thanks for watching. If you're on YouTube, if you're not on YouTube, you should go on YouTube and subscribe to us. Buy our breakfast cereal. Sign my birthday card. My birthday is next week. I'll send it out.

Peter: 37:17

Where is your coffee mug?

Tim: 37:19

Oh, not on my desk. It's night here and I don't usually drink coffee at night. This is the problem with coffee. There you go. We do actually have a Cables to Clouds coffee mug, so buy our coffee mug. If you would not, you personally, I just mean our listeners. But we will see you next time, on another episode hi everyone.

Chris: 37:41

It's Chris and this has been the Cables to Clouds podcast. Thanks for tuning in today. If you enjoyed our show, please subscribe to us in your favorite podcatcher as well as subscribe and turn for tuning in today. If you enjoyed our show, please subscribe to us in your favorite podcatcher, as well as subscribe and turn on notifications for our YouTube channel to be notified of all our new episodes. Follow us on socials at Cables to Clouds. You can also visit our website for all of the show notes at CablesToCloudscom. Thanks again for listening and see you next time.

People on this episode

Chris Miles

Co-host

Tim McConnaughy

Co-host